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INTRODUCTION 


We  are  applying  a  comprehensive  but  focused  structural  genomics  approach  to  determine 
the  atomic  resolution  crystal  structures  of  key  bacterial  virulence  factors  from  four  high 
priority  bacterial  pathogens.  These  studies  will  expedite  anti-toxin  and  vaccine  design  in 
a  number  of  different  ways.  Structural  data  will  be  made  available  to  all  appropriate 
groups  for  use  in  structure  based  drug  design.  In  addition,  we  will  generate  a  large 
library  of  expression  vectors  for  virulence  factors,  as  well  as  research  quantities  of  pure 
proteins,  which  can  readily  be  adapted  for  vaccine  production;  and  which  are  also  likely 
to  have  applications  in  detector  design.  In  the  broader  and  longer  term,  the  accumulated 
structural  information  will  generate  important  and  testable  hypotheses  that  will  increase 
our  understanding  of  the  molecular  mechanisms  of  pathogenicity,  putting  us  in  a  stronger 
position  to  anticipate  and  react  to  emerging  pathogens. 

BODY 

Task  1 :  Atomic  resolution  crystal  structures  of  virulence  factors: 

Target  Selection.  We  performed  a  detailed  analysis  of  the  Bacillus  anthracis  virulence 
plasmids.  Using  a  variety  of  bioinformatics  tools  we  identified  the  possible  function  of 
about  40  proteins  and  discovered  several  likely  operons  on  the  pXOl  plasmid.  The  most 
interesting  discoveries  include  numerous  DNA  processing  enzymes,  several  new 
regulatory  proteins  and  elements  of  the  type  IV  secretion  system.  The  results  of  the 
analysis  of  pXOl  are  now  being  prepared  for  publication  (a  draft  manuscript  describing 
this  work  is  provided  in  Appendix  IT  and  a  continuation  of  the  analysis  of  pX02 
plasmid  is  now  being  finalized. 

We  have  identified  a  new  domain  in  a  broad  range  of  bacterial ,  as  well  as  single  archaeal 
and  plant  proteins.  Its  presence  in  the  virulence-related  pXOl  plasmid  of  Bacillus 
anthracis  (pX01-01)  as  well  as  in  several  other  pathogens  makes  it  a  possible  drug  target. 
We  term  the  new  domain  nuclease-related  domain  (NERD)  because  of  its  distant 
similarity  to  endonucleases.  This  work  has  been  published  in  Trends  in  Biochemical 
Sciences  (Grynberg  &  Godzik  29:  106-110  (2004))  and  is  included  as  Appendix  2. 

Cloning  and  expression  of  novel  B.  anthracis  proteins  Two  target  lists  were  generated 
from  the  bioinformatics  approaches:  proteins  with  distant  homologues  in  the  protein  data 
base  of  structures,  and  a  second  list  of  proteins  with  no  homologues.  Research  fellows 
each  chose  5  targets  from  List  1  and  3  from  list  2.  The  work  in  progress  is  summarized 
below.  For  the  most  part,  cloning  was  successful,  and  expression  trials  are  at  various 
stages,  with  several  undergoing  crystallization  and  NMR  trials.  Crystal  structures  of 
three  novel  proteins  have  been  determined  (described  in  Appendices  3  and  41.  The  work 
on  pX01-118  and  pX02-62  has  led  to  a  focus  on  the  structure  of  the  “master  regulator”  of 
the  toxin  genes,  AtxA,  and  we  have  made  a  concerted  effort  to  express  full-length  and 
domain  fragments  in  different  hosts  and  in  a  cell-free  system  (described  in  detail  in 
Appendix  5).  Our  hit  rate  on  soluble  protein  expression  and  crystallization  has  been 
somewhat  disappointing  when  compared  with  our  general  success-rate  for  other  bacterial 
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and  eukaryotic  proteins.  The  reasons  for  this  are  unclear  at  this  stage,  although  certainly 
several  of  the  proteins  appear  to  be  toxic  to  the  host.  We  are  now  trying  to  find  improved 
general  systems  for  expression,  including  cell-free,  insect  cell  and  Bacillus  megaterium 
expression. 
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Summary  of  cloning,  expression  and  purification  of  novel  pXOl  proteins: 


pXOl-1  has  a  single  transmembrane  region  and  could  only  be  expressed  as  insoluble 
protein.  Initial  trials  using  high  concentration  of  detergent  TritonX-100  extraction  failed 
to  produce  significant  amount  of  soluble  protein.  Expression  of  the  fragment  excluding 
the  predicted  transmembrane  also  produce  insoluble  inclusion. 

pXOl-37  (Acetyltransferase)  His  tagged  full-length  pXOl-37  (1-193)  was  solubly 
overexpressed  by  E.  coli  at  30°C.  Previous  instability  problem  upon  concentrating  to 
higher  concentration  is  solved  by  adding  100  mM  DTT  to  the  protein  solution  after  Ni- 
column  purification.  Crystallization  setups  have  begun 

pXOl-47  (Transcription  Activator  of  multidrug-efflux)  His  tagged  full-length  pXOl-47 
(1-201)  was  overexpressed  in  inclusion  bodies  .  Varying  expression  conditions  did  not 
lead  to  soluble  protein.  pXOl-47  was  purified  under  denatured  condition  by  Ni-column 
and  refolded  as  soluble  protein.  DSC  experiment  is  underway  to  demonstrate  correct 
folding. 

pXOl-87  and  pXOl-99  were  expressed,  but  proved  to  be  difficult  to  purify.  Both 
proteins  were  co-purified  with  a  60  kDa  protein,  which  is  suspected  to  be  a  heat  shock 
protein  or  chaperonin.  High  resolution  columns,  superdex200HR  gel  filtration,  monoS 
and  monoQ  column  could  not  separate  the  contaminants.  Mg2+-ATP  has  been  shown  to 
enhance  dissociation  of  E.  coli  chaperonin  from  proteins  with  large  hydrophobic  surface 
area  exposed.  It  will  be  used  in  the  immediate  future  for  the  pXOl-99  and  87  protein 
purification. 

pXOl-97  was  cloned  and  gave  soluble  protein,  and  structural  analysis  by  NMR  is  in 
progress. 

pXOl-104  His  tagged  full-length  pXOl-104  (1-61)  was  overexpressed  as  inclusion 
body.  Other  conditions  have  been  tried  to  make  it  expressed  solubly  without  success. 
Refolding  experiments  are  underway. 

pXOl-109/PagR  Cloning  and  soluble  expression;  crystallization  trials  in  progress. 

pXOl-111  (homologous  to  PA  domain  4).  Cloning  and  soluble  expression; 
crystallization  trials  in  progress. 

pXOl-116  Cloning  unsuccessful  so  far. 

pXOl-117  and  143  cloning  successful  but  no  expression  in  E.  coli. 

PX01-118  (and  pX02-61)  have  been  crystallized  and  their  structures  determined  (see 

Appendix  3) 
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pXOl-121  His  tagged  full-length  pXOl-121  (1-58)  was  overexpressed  as  inclusion 
body.  Other  conditions  have  been  tried  to  express  it  solubly,  without  success.  Refolding 
is  underway. 

pXOl-125  -  cloning  and  expression  successful  -  protein  is  insoluble  and  could  not  be 
refolded. 

Cloning  of  all  the  following  target  genes  as  full-length  proteins  has  been  completed,  and 
expression  trials  are  in  progress.  All  the  genes  are  now  subcloned  into  the  bacterial 
expression  vector,  pET28a:  pXOl-96, 274  residues,  homologue  to  putative  transposase; 
pXOl-103, 317  residues,  homologue  to  site-specific  recombinase;  pXOl-105, 67 
residues,  homologue  to  regulators  of  stationary/sporulation  gene  expression;  pXOl-126, 
151  residues,  homologue  to  uncharacterized  ACR  ML0644;  pXOl-130, 237  residues, 
predicted  periplasmic  or  secreted  protein.  pXOl-04,  pXOl-07,  pXOl-10,  pXOl-32, 
pXOl-90,  pXOl-94,  pXOl-98,  a  truncated  form  of  pXOl-98,  pXOl-117,  pXOl-124, 
pXOl-127,  and  pXOl-132. 

Structural  Studies  of  inhibitor  binding  to  Lethal  Factor 

Compounds  NSC  12155,  NSC  357756,  NSC  357777  had  been  identified  as  the  top  3  hits 
in  the  USAMRIID  NCI  small  molecules  library  high  throughput  screen  for  LF  inhibition. 

We  determined  the  crystal  structure  of  LF-12155-Zn  (LF  wild-type  bound  to  NSC  12155 
in  the  presence  of  zinc),  and  this  work  in  collaboration  with  Drs.  Gussio  and  Bavari  at 
USAMRIID  has  been  published  recently  (Panchal  et  al.  Nat.  Struct.  Mol.  Biol.  11:  67-72 
(2004)  (Appendix  6).  It  showed  a  compound  that  is  able  to  bind  and  inhibit  up  to  95%  of 
the  native  catalytic  activity  of  LF.  This  compound  does  not  require  the  presence  of  zinc  to 
bind  to  the  active  site  of  LF,  and  appears  to  recognize  the  substrate-binding  site 
immediately  adjacent  to  the  catalytic  zinc  site  through  hydrophobic  interactions. 

Currently,  we  are  working  on  the  structures  of  LF-357756-Zn  and  LF-357756-Zn 
(complex  of  LF  wild-type  bound  to  NSC  357756  or  NSC  357777  in  the  presence  of  zinc), 
and  the  model  refinement  is  continuing,  with  new  data  being  collected.  So  far,  electron 
density  maps  indicate  that  compound  NSC  357756  is  bound  in  the  immediate  vicinity  of 
the  catalytic  site,  and  may  be  coordinating  the  zinc  atom.  NSC  357777  however  appears 
to  be  relying  more  on  hydrophobic  interactions  in  recognizing  the  substrate-binding  site 
in  LF,  while  still  binding  close  to  the  zinc  atom.  Currently,  the  focus  is  on  NSC  357756, 
which  has  been  shown  to  have  better  cell  permeability  abilities  than  NSC  12155  and 
better  inhibitory  abilities  than  NSC  357777  (unpublished  data,  sourced  from  USAMRIID 
colleagues). 

Crystal  structure  of  an  anthrax  toxin-host  cell  receptor  complex 

Two  closely  related  host  cell  receptor  molecules,  TEM8  and  CMG2,  bind  to  PA  with 
high  affinity  and  are  required  for  toxicity.  We  determined  the  crystal  structure  of  the  PA- 
CMG2  complex  at  2.5  A  resolution  (Appendix  8).  The  structure  reveals  an  extensive 
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receptor-pathogen  interaction  surface  that  mimics  the  non-pathogenic  recognition  of  the 
extracellular  matrix  by  integrins.  The  binding  surface  is  closely  conserved  in  the  two 
receptors  and  across  species,  but  quite  different  in  the  integrin  domains,  explaining  the 
specificity  of  the  interaction.  CMG2  engages  two  domains  of  PA,  and  modeling  of  the 
receptor-bound  PA63  heptamer  suggests  that  the  receptor  acts  as  a  pH-sensitive 
chaperone  to  ensure  accurate  and  timely  membrane  insertion. 

Task  2:  Collect  expression  vectors  and  purified  proteins  into  a  library 
suitable  for  use  bv  other  interested  groups,  and  post  the  information  on  our 
website. 


This  task  has  been  accomplished  for  the  B.  anthracis  pXOl  proteins,  and  target  selection 
and  experimental  updates  are  done  on  a  monthly  basis  in  the  light  of  new  cloning, 
expression  and  structural  data.  We  will  make  this  information  publicly  available  if  this  is 
deemed  appropriate  by  USAMRMC. 

Task  3:  Develop  a  computational  database  of  virulence-related  genes 

We  have  developed  a  preliminary  version  of  the  virulence  factor  database  (VirFact).  It  is 
available  at  http://virfact.bumham.org/.  Currently,  this  database  contains  information  on 
about  60  virulence  factors  and  about  10  pathogenic  islands,  selected  mostly  based  on 
literature  searches  (you  can  see  all  proteins  in  the  database  by  entering  an  empty  string  is 
the  search  window).  Each  of  the  proteins  in  the  database  was  annotated  using  modeling, 
distant  homology  recognition  and  sequence  analysis  tools.  One  of  the  tools  available  on¬ 
line  is  the  possibility  of  scanning  a  new  genome  for  homologues  of  virulence  factors. 
Several  potential  virulence  factors  were  identified  this  way  in  the  Francisella  genome. 
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Task  4;  Form  a  consortium  of  groups  with  similar  interests  who  are  funded 
from  other  sources,  developing  a  common  website  containing  target  selections 


rnoiect  status. 


We  plan  to  hold  an  inaugural  meeting  this  Fall.  In  the  first  instance  we  will  bring 
together  investigators  from  the  DHHS  Region  IX  -  AZ,  CA,  HI  and  NV  -  as  this 
coincides  with  our  attempts  to  create  an  NIAID  Regional  Center  of  Excellence. 


Key  research  Accomplishments 

•  In-depth  annotation  of  the  anthrax  virulence  plasmid,  and  the  identification  of  novel 
domains. 

•  Successful  expression  and/or  cloning  and  of  35  proteins  and  domain  fragments  from  the 
B.  anthracis  virulence  plasmid,  pXOl 

•  Identification,  crystal  structure  determination  and  characterization  of  a  putative  B. 
anthracis  C02  sensor 

•  Crystal  structure  of  a  B.  anthracis  amidase  homologous  the  bactericidal  phage  enzyme 
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Crystal  structure  of  anthrax  PA  in  complex  with  its  host  receptor 
6  Crystal  structures  of  anthrax  Lethal  Factor  in  complex  with  inhibitors 


Reportable  Outcomes 

Published  manuscripts: 

1 .  Panchal  RG,  Hermone  AR,  Nguyen  TL,  Wong  TY,  Schwarzenbacher  R,  Schmidt  J, 
Lane  D,  McGrath  C,  Turk  BE,  Burnett  J,  Aman  MJ,  Little  S,  Sausville  EA,  Zaharevitz 
DW,  Cantley  LC,  Liddington  RC,  Gussio  R,  Bavari  S.  Identification  of  small  molecule 
inhibitors  of  anthrax  lethal  factor.  Nat  Struct  Mol  Biol.  11:67-72  (2004). 

2.  Turk  BE,  Wong  TY,  Schwarzenbacher  R,  Jarrell  ET,  Leppla  SH,  Collier  RJ, 
Liddington  RC,  Cantley  LC.  The  structural  basis  for  substrate  and  inhibitor  selectivity  of 
the  anthrax  lethal  factor  Nat  Struct  Mol  Biol.  11:60-6  (2004) 

3.  Grynberg  M,  Godzik  A.  NERD:  a  DNA  processing-related  domain  present  in  the 
anthrax  virulence  plasmid,  pXOl  Trends  Biochem  Sci.  29:106-10  (2004) 

Manuscript  under  review: 

1.  Santelli,E.,  Bankston,  L.A.,  Leppla,  S.H.  &  Liddington,  R.C.  “Crystal  structure  of  an 
anthrax  toxin-host  cell  receptor  complex”  Submitted  to  Nature. 

Reagents  generated: 

•  Expression  vectors  for  35  proteins  from  the  B.  anthracis  pXOl  plasmid. 

•  Atomic  coordinates  have  been  deposited  in  the  Protein  Data  Bank  for  anthrax  Lethal 
Factor-inhibitor  complexes,  and  the  Protective  Antigen-host  cell  receptor  complex. 

Funding  applied  for: 

We  developed  the  initial  work  on  pX01-l  18,  pX02-61  and  AtxA  funded  by  this  grant  into 
an  in-depth  structure-function  study  in  an  application  for  a  Program  Project  grant  from 
NIAID  led  by  Dr.  Liddington  (P01  AI  55789-01).  We  recently  received  word  that  this 
proposal  has  been  funded,  and  will  start  this  Summer. 

Our  work  on  the  inhibitors  of  anthrax  Lethal  Factor  played  a  large  part  in  out  successful 
application  to  NIAID  to  develop  a  novel  class  of  inhibitors  using  in  silico  and  NMR- 
based  methods  combined  with  crystallography  (U19  AI56385-01  Dr.  Alex  Strongin, 
PJL).  Our  general  approach  also  led  to  the  successful  application  for  a  grant  to  develop 
novel  therapeutic  treatments  of  Smallpox  (U01  AI061139  -  P.1,  Dr.  Alex  Strongin) 
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On  the  strength  of  the  work  funded  by  this  grant  and  others,  we  have  been  invited  to 
participate  in  a  Regional  Center  of  Excellence  proposal  for  Region  IX  that  will  be 
submitted  this  Fall. 

Conclusions 

In  this  first  year  of  funding  we  have  focused  our  attention  on  target  selection,  protein 
expression,  purification  and  crystallization  of  proteins  encoded  by  the  Bacillus  anthracis 
pXOl  plasmid.  We  have  cloned  and  expressed  a  total  of  35  new  proteins,  and  structural 
analysis  of  several  of  these  is  underway.  Currently,  3  new  crystal  structures  are 
essentially  complete,  6  crystal  structures  of  anthrax  Lethal  Factor  in  complex  with  small 
molecule  inhibitors  provided  by  our  collaborators  at  USAMRIID  and  elsewhere.  We 
have  also  determined  the  first  crystal  structure  of  a  complex  between  anthrax  protective 
Antigen  and  its  host  cell  receptor  (under  review  at  Nature).  In  the  next  year,  in  addition 
to  continuing  the  B.  anthracis  work,  we  propose  to  focus  on  newly  annotated  F.  tularensis 
genome  and  apply  a  similar  set  of  tools  to  elucidate  virulence  in  this  poorly  studied 
organism. 


So  what  section:  Knowledge  of  protein  structure  and  inhibitor  complexes  at  atomic 
resolution  is  typically  a  pre-requisite  for  rational  drug  design.  Therapeutics  do  not  exist 
for  any  of  the  major  pathogens  likely  to  be  used  in  biowarfare  or  bioterrorism.  Our 
efforts  in  the  first  year  were  focused  towards  the  design  of  anthrax  therapeutics,  for 
which  the  need  is  compelling  since  antibiotic  treatments  have  limited  effectiveness  and 
vaccines  are  problematic.  The  work  described  in  this  proposal  provides  the  first  stages  of 
target  identification  and  characterization,  and  the  structures  we  have  already  determined 
bear  directly  on  inhibitor  design.  The  work  has  also  allowed  us  to  leverage  funding  in 
several  NIAID-funded  research  projects  to  carry  out  in-depth  structure-function  studies 
that  will  enable  the  next  stages  of  drug  design. 

References 

References  are  included  in  appropriate  appendices. 

Appendices 

Appendix  1:  Surprising  connections:  in-depth  analysis  of  the  Bacillus  anthracis  pXOl 
plasmid  (manuscript  in  preparation) 


Appendix  2:  NERD:  a  DNA  processing-related  domain  present  in  the  anthrax  virulence 
plasmid,  pXOl  (published  manuscript) 

Appendix  3:  Discovery,  crystal  structures  and  characterization  of  a  putative  C02  sensor 
domain,  “BACO” 
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Appendix  4:  Crystal  Structure  of  B .  anthracis  amidase  homologous  to  a  bacteriophage 
lysin 


Appendix  5:  Structural  studies  of  AtxA,  a  member  of  the  PRD  family  of  transcriptional 
activators. 

Appendix  6:  “Identification  of  small  molecule  inhibitors  of  anthrax  lethal  factor” 
(Published  paper) 

Appendix  7:  The  structural  basis  for  substrate  and  inhibitor  selectivity  of  the  anthrax 
lethal  factor  (published  paper). 

Appendix  8:  Crystal  structure  of  an  anthrax  toxin-host  cell  receptor  complex  (manuscript 
submitted  to  Nature) 
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Surprising  connections:  in-depth  analysis  of  the  Bacillus  anthracis 
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ABSTRACT 

Anthrax  disease  is  caused  by  Bacillus  anthracis.  Virulence  of  this  bacterium  has  been 
associated  with  two  plasmids,  pXOl  and  pX02.  We  used  the  DNA  and  proteome 
sequences  of  pXO  1  to  understand  where  it  comes  from  and  what  are  the  functions  of,  so 
called,  unknown  open  reading  frames.  For  this  purpose,  we  used  context  analysis  and 
distant  homology  tools  that  allowed  us  to  discover,  among  others,  two  type  IV  secretion 
system-like  operons.  We  also  significantly  increased  the  description  of  many  pXOl 
ORFs  and  showed  its  mosaic  nature. 

[Supplemental  material  available  online  at  bioinformatics.ljcrf.edu/pX01.  The  genome 
sequence  data  is  available  at  NCBI: 

http://www.ncbi.nlm.nih. gov/genomes/framik.cgi?db=Genome&gi=:  16452.] 

Keywords:  anthrax,  Bacillus  anthracis,  pathogenicity,  virulence,  pXOl,  type  IV  secretion 
system,  type  IV  pilus  assembly  system,  context  analysis,  distant  homology,  ArsR,  SmtB, 
regulators. 

4Corresponding  author.  E-mail:  adam@burnham.org:  FAX:  +1  (858)  713 
9930. 

INTRODUCTION 

Anthrax  is  an  ancient  disease  primarily  affecting  herbivores  but  also  attacking  other 
mammals,  including  humans.  Scientific  studies  of  this  disease  started  in  the  middle  of  the 
XVIII  century  and  almost  a  century  later  the  agent  was  recognized  as  Bacillus  anthracis, 
a  bacterium  causing  many  diverse  manifestations  of  the  same  infection.  From  pioneering 
work  of  Pasteur  (1881)  a  quest  for  an  effective  vaccine  started  and  until  the  end  of  sixties 
of  the  XX  century,  the  work  progressed  much,  not  only  in  the  creation  of  human  anthrax 
vaccines  but  also  in  the  elucidation  of  the  three-component  nature  of  the  anthrax  toxin 
(for  review  see  [Turnbull,  2002]).  A  couple  of  incidents  and  a  threat  of  bioterrorism, 
however,  started  an  era  of  even  more  intensive  work  on  B.  anthracis.  The  sequencing 
projects  [Okinaka,  1999;  Pannucci,  2002;  Read,  2003]  and  projects  focusing  on  the 
mechanisms  of  toxin  functions,  regulation  and  release  (for  review  see  [Koehler,  2002]), 
advanced  significantly  the  understanding  of  anthrax  pathogenesis.  The  B.anthracis 
genome  consists  of  a  5.23-Mb  chromosome  and  two  megaplasmids,  pXOl  (181.7  kb)  and 
pX02  (94.8  kb)[Okinaka,  1999;  Read,  2003],  Genetic  analysis  focused  mainly  on  toxins 


(PagA,  LEF,  CyaA),  cell  envelope  and  germination  geness  (Cap,  S-layer  and  Ger 
proteins),  and  the  regulatory  mechanisms  triggering  the  virulence.  From  genetic 
experiments  and  informatic  analyses,  we  know  that  both  the  chromosomally-  and 
plasmid-encoded  factors  control  the  toxin  genes.  The  chromosomal  copy  of  the  AbrB- 
encoding  gene  is  a  negative  controller  of  toxin  genes  (pagA,  cyaA  and  lej)  as  well  as  of 
the  atxA  gene  [Saile,  2002],  the  main  pXOl-encoded  activator  of  the  pXOl-encoded 
toxin  genes  [Dai,  1995],  AtxA  was  also  shown  to  activate  pX02  genes.  It  triggers  the 
production  of  capsule  proteins  via  the  activation  of  homologous  activators  from  the 
pX02  plasmid,  AcpA  and  AcpB  [Drysdale,  2004],  In  total,  at  least  7  pXOl  and  10  pX02 
genes  are  regulated  by  the  AtxA  protein  [Bourgogne,  2003].  In  addition  to  that  another 
regulator,  the  PagR  protein,  has  a  weak  negative  effect  on  the  pag  operon  and  regulates 
the  cell  envelope  genes,  the  S-layer  genes,  sap  and  eag  [Hoffmaster,  1999;  Mignot, 

2003],  Virulence-related  genes  are  also  known  to  be  regulated  by  temperature  and 
CCh/bicarbonate  levels  [Bartkus,  1994;  Sirard,  1994], 

Recent  works  have  focused  on  bioinformatic  analyses  of  the  anthrax  genome  and 
phylogenetically-related  plasmids  [Ariel,  2002,  Berry,  2002;  Ariel,  2003;  Rasko,  2004]. 
The  studies  confirmed  that  B.anthracis  is  closely  related  to  B.thuringiensis  and  B.cereus, 
and  showed  many  previously  unknown  features  of  the  deadly  plasmids,  pXOl,  pBtoxis 
and  pBc  10987,  respectively.  These  works  did  not  answer  our  questions:  (I)  where  do  the 
elements  of  the  pXOl  plasmid  come  from,  and  (II)  what  do  the  unknown  genes  encoded 
on  pXOl  do?  For  analysis,  we  used  the  most  recent  sequence  of  the  pXOl  plasmid 
[Read,  2003],  and  we  did  not  focus  on  similarities  to  other  bacilli  plasmids.  We  were 
interested  in  operon  conservation  and  twilight  zone  homologies  that  can  reveal 
hypothetically  important  features  for  the  virulence  function  of  that  plasmid. 


RESULTS 

Statistics 

DNA  level  analysis 

Previous  analyses  have  been  performed  to  analyze  the  DNA  sequence  of  the  pXOl 
plasmid  [Okinaka,  19999;  Read,  2002;  Pannucci,  2002],  ORF  prediction  programs  were 
used,  the  DNA  motifs  were  discovered  and  a  connection  between  promoter  elements  and 
ORFs  was  already  done.  Our  analysis  of  the  DNA  sequence  focused  on  two  aspects. 
First,  we  were  interested  in  the  discovery  of  the  origin  of  replication  since  no  genes 
obviously  involved  in  this  process  could  be  detected.  Second,  we  searched  for  specific 
DNA  regions  related  to  pathogenicity. 


li 

m 

BUI 

HB 

m 

n 

SB  fill 

111 

I 

mu  i 

ill 

B 

*—rv. 

n 

BIB 

1! 

bi  b 

mi 

b 

IK  OT 

0 

0 

0 

in  i 

ra 

8 

0 

B 

sun 

ill! 

b  m  b 

mm 

B 

ill  il 

0 

B 

0 

man  b s  iiii 

B 

. 

11 

m 

Sill 

in  s 

m 

0 

0 

0  0  1  BKH 

m 

0 

1 

mi 

m 

0  i 

m 

0 

1 

I  I  li 

in 

il 

B  Ii 

ii 

HU 

mi 

B 

B 

ran 

in 

0 

s 

i  fi 

m 

iiB 

Hi 

HOT 

0 

0 

s 

m 

i  r 

I 

I  1 

B" 

1 

0 

0 

0 

fin 

m 

HI 

0 

0 

0  S  1 

B 

il 

s 

11 

SB 

HE 

0 

0 

.  B  B  I 

1  8 

1 

ii 

il 

m 

m 

11! 

m 

1 

"B 

0 

i  <^***(»> 

mi 

i 

i 

1 

111 

1 

B 

i  i  i  a 

«  t 

I  3  it  »i  S  i 


r  si;:  s  ?*  ;;  r. 

i  Qui  ml  I  tl  u  Hits  jj  ii 


sss?2  s  55  i?sr  1 55  5  ;  1  m  X  J  c  ;ut  * 

«3;|  |  §  S|ll  ||1 1  i  i  tt*  i  *  fig  i  ffii  l 

. k . % . | . 5 . rlilA  fcir 

Iff  t  §  i  iHInlteillM! 


inilppf flip?  ifif 

jLhM . .:fiP:::.M...r 


b  m 

"B . ar 


~ . r 

in r 


*  ^(^urfrr 
1*  n*(  H>  •(  /«!  5 


s  !«««  sss  :j  :::  s  :  ;  ;  asju  ssszsss  :  2s*;  ??  s?*;??*  ?  ¥sWE?s  »ssss:»i;  25  ?  t?  «i 

1  lilll!  IS!  fill  I!  ill !!  1 1  fill  itliilt  I II!  II  iffiii!  I IIHIII  flMffit  II  Hi 


Figure:  Summary  of  the  pXOl  annotation 


pXOI  plasmid  annotation 
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Figure.  Improvement  of  pXOI  plasmid  annotation  using  FFAS  profile-profile 
algorithm  [Rychlewski,  2000  #6]  and  context  analysis  (ERGO). 


One  goal  was  to  find  proteins  directly  involved  in  plasmid  replication. 
Unfortunately,  we  could  not  detect  those.  Therefore,  we  used  the  Oriloc  program  to 
predict  the  bacterial  origin  of  replication  [Frank,  2000].  In  bacteria,  the  leading  strands 
for  replication  are  enriched  in  keto  (G,  T)  basis  while  the  lagging  strand  is  enriched  in 
amino  bases  (A,  C)[Rocha,  1999].  This  compositional  assymetry  allows  the  identification 
of  probable  origin  and  termination  sites  of  replication.  Oriloc  analysis  indicated  a 
potential  origin  of  replication  between  bases  66538  to  66558  which  is  quite  close  to  the 
origin  predicted  earlier  by  Berry  and  colleagues  (60955-62192  region)[Berry,  2002],  The 
origin  is  predicted  in  the  neighborhood  of  hypothetical  proteins,  with  no  recognizable 
homology  to  proteins  from  publicly  available  databases.  It  is  located  between  ORFs 
BXA0076  (previously  pXOl-51)  and  BXA0077  (pXOl-52).  The  termination  of 
replication  may  lie  around  the  position  173914  on  the  pXOI  plasmid,  between  genes 
BXA0206  (pXOl-137)  and  BXA0207  (pXOl-138)  which  encode  an  RNA-binding  Hfq 
(Host  Factor  I)  protein  and  the  transcription  regulator  from  the  ArsR  family,  respectively. 

At  the  DNA  level,  we  were  interested  in  finding  regions  connected  to  the  regulation  of 
virulence.  We  focused  on  genes  regulated  by  AtxA  [Bourgogne,  2003].  Our  goal  was  to 
characterize  DNA  regions  involved  in  AtxA  binding.  For  this  purpose,  we  collected 
intergenic  sequences  preceding  the  AtxA-dependent  genes  (see  Table  1  in  Bourgogne, 
2003]  and  analyzed  it  using  the  MEME  [Bailey,  1994]  and  the  MITRA  [Eskin,  2002] 
programs.  The  only  common  motif  that  we  could  find  was  ANGGAG  which  was  located 
in  diversified  distances  (5-600  bp)  from  the  putative  ATG  translation  start  codon.  Large 
differences  in  the  location  of  the  ANGGAG  motif  can  be  attributed  to  unrecognized 
ORFs  located  upstream  from  some  of  the  analyzed  genes,  in  the  same  operon.  Another 
possibility  is  that  this  signal  is  false.  Deletion  experiments  of  these  cis  elements  should 
be  performed  to  check  our  hypothesis. 

Protein  level  analysis 

In  our  efforts  to  understand  the  function  of  pXOI  we  focused  on  the  plasmid  proteome, 
looking  for  a  consistent  picture  of  its  function  and  phylogeny.  We  used  context  analysis 
methods  to  reveal  interesting  connections  with  other  regulons.  As  already  mentioned  in 
the  introduction,  we  did  not  focus  on  obvious  similarities  to  other  bacilli  and  their 
plasmids,  even  if  this  data  is  available  in  Figure  1 .  From  the  total  analysis  of  pXOI 


ORFs  we  realized  two  major  features.  First,  pXOl  has  genes  collected  from  many 
different  species  (Figure  1).  It  also  contains  a  lot  of  truncated  and  mutated  sequences 
with  homology  to  existing  genes  (Supplementary  data).  Second,  pXOl  has  a  significant 
number  (15%)  of  proteins  related  to  DNA  metabolism  (Supplementary  data). 

Gram-positive  bacteria  connection 

BXA0010,  BXA0013  and  BXA0015  proteins  are  similar  to  proteins  from  two  Gram¬ 
positive  species,  Xanthomonas  and  Burkholderia  (Figure  2A),  and  from  the 
proteobacterial  Pseudomonas  group.  BXA0010  and  BXA0013  are  homologues  of  the 
Xanthomonas  orf8  (gi  number:  21242978),  Burkholderia  protein  (gi  number:  22985671) 
and  a  number  of  Pseudomonas  genus  proteins  (gi  numbers:  24461733,  37955709, 
40019205).  All  of  these  proteins  belong  to  the  superfamily  II  of  DNA/RNA  helicases, 
and  BXA0010  seems  to  be  a  duplication  of  the  middle  part  of  the  BXA0013  protein.  In 
between  these  two  proteins  in  B.anthracis  there  is  an  inserted  reverse  transcriptase 
(BXA001 1).  One  can  hypothesize  that  this  insertion  occurred  after  the  duplication  and 
disrupted  the  BXA0010  gene.  BXA0013  forms  an  operon  with  BXA0015;  this  feature  is 
also  conserved  as  an  operon  in  the  species  mentioned  above  ( Xanthomonas  21242977, 
Burkholderia  22985672,  and  Pseudomonas  spp.  24461735,  37955708, 40019206). 
BXA001 5  has  strong  similarity  to  the  N-terminal  part  of  its  homologues.  This  region  of 
the  protein  encodes  the  coenzyme-binding  domain  of  various  DNA  methyltransferases 
(FFAS  score:  -42.200  to  lg38a).  Except  for  anthrax,  other  species  preserve  numerous 
proteins  in  the  operon.  Unfortunately,  even  the  most  advanced  tools  could  not  recognize 
any  homology  of  other  ORFs  to  known  proteins;  therefore  their  function  remains  a 
mystery.  From  the  function  of  known  members  of  this  operon  one  can  imply  a  DNA 
modifying  function. 

Actinobacteria/Cyanobacteria  connection 

BXA0032  and  BXA0033,  if  fused,  can  be  a  part  of  the  COG0175  family.  These  proteins 
belong  to  3'-phosphoadenosine  5'-phosphosulfate  sulfotransferase  (PAPS  reductase)/FAD 
synthetase  group  of  enzymes  which  are  linked  to  ATPase  involved  in  DNA 
repair/chromosome  segregation  from  Anabaena  spp.,  Nostoc  spp.,  Bacillus 
stearothermophilus  and  Streptomyces  avermitilis.  The  functions  of  other  proteins  from 
the  regulon  are  unknown.  In  B.anthracis  however,  it  is  located  close  to  BXA0034.  We 
described  the  members  of  this  family  as  a  new  HEPN  nucleotide-binding  domain 
[Grynberg,  2003],  and  a  clear  connection  with  BXA0037,  a  nucleotidyltransferase 
domain  protein,  is  obvious.  As  a  complex  they  may  catalyze  the  addition  of  a  nucleotidyl 
group  to  unknown  substrates,  perhaps  to  antibiotics  or  other  poisonous  substances,  as 
their  structural  homolog  kanamycin  nucleotidyltransferase  does  [Matsumura,  1984],  The 
function  of  these  operons  is  unknown. 

Bacilli  connection 

BXA0091  and  BXA0094  are  homologous  to  each  other  and  to  proteins  from  several 
other  bacilli;  Enterococcus,  Listeria,  Lactococcus,  Lactobacillus,  or  other  Bacillus 
species.  This  protein  family  is  of  unknown  function  and  is  hypothesized  to  be  an 
extracellular  protein  [Nakai,  1999].  Not  only  in  anthrax,  but  also  in  E.faecalis  and 
B.thuringiensis,  this  gene  is  represented  by  at  least  two  copies  in  each  operon.  In  B. 


anthracis,  B.thuringiensis,  Listeria  innocua  and  E.faecalis  the  BXA0091  homologues 
colocalize  with  a  surface  layer  domain  protein  (FFAS  score:  -1 1.500  to  COG  1361  for  the 
BXA0092  protein).  Interestingly,  in  species  other  than  anthrax,  these  two  proteins  often 
colocalize  with  three  proteins:  a  protein  homologous  (FFAS  score:  -10.100)  to  a  protein 
containing  the  LysM  domain  (homology  is  not  in  the  LysM  region),  a  protein 
homologous  to  RTX  toxin  and  related  Ca2+-binding  proteins  family  (FFAS:  -12.000  to 
COG2931)  and  a  regulatory  protein  homologous  to  transcription  positive  regulators 
MGA  (FFAS:  -85.100).  LysM  domain  binds  peptidoglycans  and  was  first  identified  in 
bacterial  lysins  [Ponting,  1999].  Several  proteins,  such  as  staphylococcal  IgG  binding 
proteins  and  E.coli  intimins,  contain  LysM  domains.  RTX  toxins  are  pore-forming, 
calcium-dependent  cytotoxins  encoded  by  various  bacterial  genomes  [Braun,  2000],  and 
MGA  are  important  in  streptococci  virulence  [Mclver,  2002].  Other  unknown  proteins 
from  these  operons  are  predicted  to  be  extracellular.  All  these  facts  strongly  suggest  that 
these  related  operons  can  be  involved  in  pathogenesis. 

Type  IV  secretion  system  machinery 

In  two  operons  we  identified  proteins  strongly  resembling  known  type  IV 
secretion  system  proteins.  The  first  is  composed  of  the  BXA0083  protein  involved  in 
type  IV  pili  biogenesis,  CpaB/RcpC  (COG3745)(CDD  score:  le-18),  the  VirBl  1  family 
protein  (BXA0085)(CDD:  le-41)  and  two  homologous  unknown  proteins  that  belong  to 
the  TadC  family  (COG2064),  often  found  in  the  operon  with  the  VirBl  1  protein 
(BXA0086  and  BXA0087)(FFAS:  -19.400  and  -12.300,  respectively).  The  second 
operon  contains  the  homolog  of  the  VirB4  protein  (BXA0107)(FFAS:  -64.100)  and  a 
fusion  of  the  VirB6  homology  region  with  a  surface-located  repetitive  sequence,  similar 
to  coiled-coil  proteins,  with  a  methyl-accepting  chemotaxis  protein  (MCP)  signaling 
domain  at  the  C  terminus  (BXA0108)(FFAS:  -1 1 .600  to  VirB6,  -1 1 .700  to  myosin  tail 
and  -23.200  to  the  MCP  domain). 

In  the  first  operon  the  most  studied  is  the  VirBl  1  [Dang,  1999;  Krause,  2000; 

Yeo,  2000;  Christie,  2001;  Savvides,  2003].  The  model  predicts  that  VirBl  1  family  of 
ATPases  “function  as  chaperones  reminiscent  of  the  GroEL  family  for  translocating 
unfolded  proteins  across  the  cytoplasmic  membrane”  [Christie,  2001].  Both  the 
BXA0083  and  two  homologues  BXA0086  and  BXA0087,  as  well  as  BXA0085  VirBl  1 
homologue,  are  distant  homologues  of  proteins  also  forming  an  operon  in  many  gram¬ 
negative  bacterial  species  [Kachlany,  2000;  Skerker,  2000],  The  genes  CpaB/RcpB  and 
TadC  form  large  families,  widespread  in  bacteria  and  archaea  [Kachlany,  2000].  The  cpa 
operon  in  Caulobacter  crescentus  was  proven  to  be  required  for  pilus  assembly  [Skerker, 
2000],  Amazingly,  we  couldn’t  identify  the  homolog  of  pilA  gene  or  any  other  pilin 
subunit,  which  is  necessary  for  pilus  formation. 

The  BXA0107  protein  from  the  second  operon  belongs  to  the  large  VirB4  family. 
It  is  one  of  the  elements  of  the  type  IV  secretion  system  important  in  the  delivery  of 
effector  molecules  to  the  host  cell  [Christie,  2000;  Christie,  2001  and  citations  therein]. 
This  system,  ancestrally  related  to  the  conjugation  machinery,  is  able  to  deliver  DNA 
molecules  as  well  as  proteins.  VirB4  is  an  ATPase  that  “might  transduce  information, 
possibly  in  the  form  of  ATP-induced  conformational  changes,  across  the  cytoplasmic 
membrane  to  extracytoplasmic  subunits,”  according  to  Christie  [Christie,  2001]  and  Dang 


[Dang,  1999],  It  contains  the  Walker  A  motif  responsible  for  ATP  binding,  which  is  well 
conserved  in  BXA0107  (200-207  fragment:  GISGSGKS). 

BXA0107  forms  an  operon  with  the  BXA0108  protein  which  has  at  least  7 
predicted  N-terminal  (55-281  aa)  transmembrane  motifs,  similar  to  the  central  part  of  the 
VirB6  protein,  and  a  surface-located  repetitive  sequence,  most  probably  forming  a  coiled- 
coil  structure.  The  homology  at  the  C-end  is  to  a  domain  that  is  thought  to  transduce  the 
external  chemotaxis  signal  to  the  two-component  histidine  kinase  CheA  (for  review  see 
[Stock,  2002]). 

The  next  protein  in  this  operon  resembles  the  C-terminus  of  a  Bacillus  firmus 
integral  membrane  protein,  consisting  of  transmembrane  domains  in  the  N-tenninal  part. 
This  region  is  homologous  to  the  phosphatidate  cytidylyltransferase  (EC  2.7.7.41),  an 
enzyme  that  catalyzes  the  synthesis  of  CDP-diglyceride,  the  source  of  phospholipids  in 
all  organisms  [Sparrow,  1985;  Icho,  1985],  The  function  of  the  C-terminal  part  of  the 
B.firmu  protein  is  unknown. 

The  presence  of  three  proteins  with  features  characteristic  of  type  IV  secretion 
system  and  other  ORFs  related  to  type  IV  pilus  formation  is  completely  unexpected. 
Unfortunately,  we  were  not  able  to  detect  any  other  elements  of  this  machinery  in  the 
plasmids  or  chromosome.  Is  the  presence  of  incomplete  operons  an  evolutionary  artifact, 
a  minimal  complex  to  deliver  molecules  to  the  host,  or  a  part  of  a  larger  complex  not  yet 
recognized  with  the  use  of  available  tools?  These  operons  are  good  targets  for 
experimental  analysis.  The  discovery  of  putative  molecules  secreted  by  this  system  may 
be  crucial  for  our  understanding  of  diverse  roles  of  pXOl  in  virulence. 


Particular  cases 

The  statistical  analysis  of  the  pXOl  megaplasmid  is  a  convenient  way  to  describe 
the  general  physiology.  It  does  not,  however,  allow  one  to  understand  the  complexity  of 
each  protein’s  function.  In  the  detailed  analysis  of  B.anthracis  ORFs  we  focused  on 
particular  cases,  especially  from  the  pathogenic  region  [Okinaka,  1999;  Sirard,  2000]  that 
are  of  special  interest  to  the  scientific  community. 

BXA0139 

The  BXA0139  protein  is  located  close  to  the  edema  factor  (CyaA)  on  the  pXOl 
sequence.  It  is  150  amino  acids  long,  located  on  an  operon  with  two  unknown 
hypothetical  proteins,  BXA0138  and  BXA0140.  The  only  known  fact  about  these 
proteins  is  the  similarity  of  BXA0138  to  BXA0149  (Table  1).  The  most  interesting 
finding  is  the  homology  of  BXA0139  to  the  C-terminal  end  of  the  hemolysin  II  from 
B.cereus  [Miles,  2002].  This  homology  has  already  been  described  by  Miles  et  al.  (2002), 
but  only  as  a  similarity  to  a  46-amino  acid  segment  of  BXA0139.  In  reality,  however, 
BXA0139  is  a  duplication  of  the  same  fragment,  and  C-end  of  hemolysin  II  is  similar  to 
both  the  N-  and  C-terminal  parts  of  BXA0139  (Fig.  3).  The  significance  of  the  C- 
terminus  of  the  hemolysin  II  in  B.cereus  is  unknown,  and  the  functional  studies  suggest  it 
has  no  influence  on  the  hemolytic  activity  of  the  enzyme  [Baida,  1999;  Miles,  2002], 
Hemolysins  form  heptameric  rings  [Song,  1996;  Gouaux,  1997],  in  which  the  C-terminal 
domain  would  reside  in  the  outside  part  of  each  monomer  [Miles,  2002],  Miles  and 
colleagues  (2002)  suggest  three  possible  functions  for  this  domain,  however  they  do  not 


exclude  other  possibilities.  Either  it  is  needed  to  form  lattices  or  bind  to  surfaces,  or  has 
some  catalytic  activity.  We  also  hypothesize  an  auxiliary  function  for  the  main  monomer 
domain,  maybe  a  regulatory  function?  Quite  peculiar  is  the  presence  of  a  tandem  tail-to- 
head  repeat  coded  by  the  pXOl  plasmid.  It  is  not  fused  to  any  catalytic  domain  and  no 
overall  function  for  the  whole  operon  is  known.  The  most  attractive  hypothesis  would  be 
the  binding  to  surfaces.  Maybe  it  serves  as  an  anchor  to  the  host  cell  membrane  during 
the  attack? 

An  interesting  finding  may  give  a  clue  to  a  real  function  of  BXA0139.  We  found  a 
hemolysin  II  homolog  in  B.anthracis  genome  (gi:  21400399)  that  is  almost  identical  to 
the  B.cereus  enzyme.  However,  in  all  anthrax  strains  sequenced,  there  is  a  nonsense 
mutation  (TGG  to  TGA),  instead  of  tryptophan  372  in  B.cereus.  In  order  to  “recreate”  a 
real  sequence,  not  the  one  that  is  an  automatic  translation  deposited  at  NCBI,  we  ran  the 
BLASTX  program  using  the  genomic  sequence  with  large  overhangs  on  both  sides  of  the 
recognized  ORF.  The  resulting  sequence  is  given  in  the  alignment  in  Figure  3.  So,  if  the 
anthrax  mutation  is  real  (and  its  existence  in  all  anthrax  strains  seems  to  reinforce  this 
notion),  we  can  hypothesize  that  BXA0139  is  auxiliary  to  the  hemolysin’s  function.  It 
may  contribute  to  some  attack  related  function  that  has  nothing  to  do  with  the  hemolytic 
activity. 

BXA0167 

This  hypothetical  ORF  has  no  known  homologs  and  no  distant  homologs.  Its 
function  is  also  not  known.  It  is  a  product  of  automatic  translation.  We  could  assume  then 
that  it  is  not  an  interesting  target  for  analysis. 

We  conducted,  however,  a  BLASTX  analysis  along  its  sequence  and  found  an 
interesting  homology  coded  by  the  opposite  strand.  Loaded  with  nonsense  mutations,  we 
found  a  strong  homology  to  the  N-terminus  of  the  lethal  factor  (corresponding  to  9-176 
amino  acids  of  LEF)(data  not  shown).  Noticeably,  this  homology  region  is  encoded  by 
the  opposite  strand  from  the  LEF  gene.  Is  it  an  example  of  a  duplication  event  covered  up 
by  other  events  that  happened  later  in  the  course  of  evolution?  Was  the  part  of  the  N- 
terminal  LEF  domain  functional  in  the  past? 

pXOl  regulators 

The  most  important  elements  in  the  description  of  unknown  biological  systems 
are  the  regulatory  proteins.  They  decide  when,  who  and  how  is  expressed  in  the  cell.  In 
pathogenic  systems,  frequently  regulators  of  virulence  genes  are  located  in  pathogenic 
regions.  However,  various  permutations  are  known,  where  regulators  regulate  genes 
outside  of  the  pathogenicity  island,  or  regulators  encoded  outside  of  the  pathogenicity 
island  regulate  genes  located  in  the  virulence  regions,  or  even  regulators  regulate 
virulence  factors  as  well  as  other  genes  not  related  with  pathogenicity  [for  review  see: 
Hacker,  2000;  Hentschel,  2001],  We  think  then  that  it  is  essential  to  describe  these 
regulators  on  anthrax  pathogenicity  vector  in  order  to  decipher  the  physiology  of  pXOl . 

BXA0020 

BXA0020  is  564  amino  acids  long.  The  C-terminal  60-70  aa  are  homologous  to  DNA- 
binding  domains  of  several  repressor  families  (SCOP:  a.35.1  superfamily  of  lambda 


repressor-like  DNA-binding  domains).  The  one  that  is  the  most  similar  (FFAS  score:  - 
1 1 .900)  is  the  SinR  repressor  domain  [Gaur,  1986],  In  Bacillus  subtilis  the  proteins  of  the 
sin  (sporulation  inhibition)  region  form  a  component  of  an  elaborate  molecular  circuitry 
that  regulates  the  commitment  to  sporulation.  SinR  is  a  tetrameric  repressor  protein  that 
binds  to  the  promoters  of  genes  essential  for  entry  into  sporulation  and  prevents  their 
transcription  [Mandic-Mulec,  1992;  Mandic-Mulec,  1995].  In  B.anthracis  pXOl  plasmid, 
BXA0020  does  not  form  an  operon  with  sin  genes.  Instead,  it  is  located  close  to  a  protein 
(BXA0019)  that  is  characterized  as  similar  to  the  middle  fragment  (417-1236  aa)  of  the 
236  kDa  rhoptry  protein  from  Plasmodium  yoelii yoelii,  involved  directly  in  the  parasite 
attack  of  red  blood  cells  [Khan,  2001].  It  is  not  certain  whether  they  form  one  operon 
since  both  genes  have  putative  independent  ribosome  binding  sites.  The  N-terminal 
region  of  BXA0020  is  not  well  described  and  has  the  strongest  similarity  to  the  a-helical 
part  of  the  chromosome-associated  kinesin  (e-value:  6e-06  to  the  A.thaliana  protein,  gi: 
22327992),  or  the  kinesin-like  domain  (KOG0244).  Kinesins  are  microtubule-dependent 
molecular  motors  that  play  important  roles  in  intracellular  transport  of  organelles  and  in 
cell  division  [Woehlke,  2000;  Mandelkow,  2002]. 

BXA0048 

The  N-terminal  part  of  BXA0048  is  the  DNA-binding  helix-tum-helix  motif  that 
belongs  to  the  TetR  family  (PF00440).  Members  of  this  family  take  part  in  the  regulation 
of  numerous  pathways/operons,  e.g.  TetR  is  a  tetracycline  inducible  repressor  [Hillen, 
1994],  Betl,  a  repressor  of  the  osmoregulatory  choline-  glycine  betaine  pathway  [Lamark, 
1996],  MtrR,  a  regulator  of  cell  envelope  permeability  that  acts  as  a  repressor  of 
mtrCDE-encoded  and  activator  of farAB-e ncoded  efflux  pumps  [Lee,  1999;  Lee,  2003], 
We  were  unable  to  determine  any  reasonable  homology  to  the  distal  part  of  BXA0048, 
therefore  no  functional  hypothesis  can  be  drawn.  The  only  indication  for  the  function  of 
that  regulator  is  the  probable  placement  on  one  operon  with  a  nucleotidyltransferase 
(BXA0047).  The  presence  on  the  same  operon  of  the  nucleotidyltransferase  (gi:  5459398) 
with  a  superfamily  II  DNA  and  RNA  helicase  family  protein  (gi:  5459399)  in 
Streptomyces  coelicolor  can  be  a  suggestion  that  BXA0048  is  involved  in  the  DNA 
metabolism. 

BXA0060 

The  BXA0060  belongs  to  the  large  superfamily  of  repressors  (SCOP:  a.35.1).  It  is 
composed  of  the  DNA-binding  domain  only.  Homologues  of  BXA0060  are  present  in 
numerous  archaeal  and  eubacterial  genomes  and  do  not  preserve  the  operon  structures.  It 
seems  then  that  BXA0060  homologues  are  involved  in  very  diverse  functions/pathways. 

BXA0069 

BXA0069  belongs  to  the  family  of  global  transcription  activators  of  membrane- 
bound  multidrug  transporters,  responsible  for  bacterial  multidrug  resistance 
(MDR)[Paulsen,  1996].  The  closest  homologue  is  the  B. subtilis  MtnA  regulator  that 
belongs  to  the  MerR  family  (FFAS:  -42.300)[Summers,  1992],  It  is  known  to  activate 
two  MDR  transporters  ( bmr  and  bit),  a  transmembraneous  protein-coding  gene  yd/K  and 
its  own  gene  [Baranova,  1999].  It  acts  independent  from  two  specific  activators,  BmrR 
and  BItR,  that  are  encoded  by  bmr  and  bit  operons  [Ahmed,  1995]. 


MtnA  and  other  members  of  the  MerR  family  are  composed  of  three  regions;  N- 
terminal  DNA-binding  domain  (winged  helix-tum-helix  motif),  middle  all-helical 
dimerization  region  and  the  C-terminal  part  specific  for  each  protein  that  is  probably 
involved  in  specific  ligand  binding  [Godsey,  2001],  BXA0069  perfectly  fits  this 
description,  it  possesses  quite  conserved  two  distal  regions  and  a  90  amino  acid  region  of 
no  homology  that  has  an  almost  80%  probability  of  a  coiled-coil  structure  [Lupas,  1991]. 
Because  of  lack  of  resemblance  of  the  C-terminus  to  any  known  regulatory  domain,  it  is 
difficult  to  propose  in  what  metabolism/gene(s)  activation  is  the  BXA0069  protein 
involved. 

BXA0122 

The  FFAS  analysis  revealed  low  score  similarity  (FFAS:  -10.500)  of  BXA0122  to 
the  MarR  regulators  of  the  multiple  antibiotic  resistance  locus  [Seoane,  1995;  Grkovic, 
2002].  This  regulon  consists  of  the  marRAB  operon  and  the  marC  gene.  MarR  acts  as  a 
repressor  by  binding  as  a  dimer  to  promoter  regions  of  the  mar  regulon  [Martin,  1995]. 
The  repressive  DNA-binding  by  MarR  can  be  inhibited  by  several  anionic  compounds, 
e.g.  salicylate  [Alekshun,  1999]. 

AtxA 

AtxA  is  a  proven  regulator  of  anthrax  toxin  genes  [Uchida,  1993;  Koehler,  1994; 
Dai,  1995].  It  is  also  known  to  influence  the  expression  of  other  genes  on  pXOl,  pX02 
plasmids  and  the  anthrax  genome  [Bourgogne,  2003].  AtxA  is  a  member  of  the  PTS  (the 
phosphoenolpyruvate-dependent,  sugar  transporting  phosphotransferase  system) 
regulatory  domain-containing  family  [Greenberg,  2002],  Members  of  this  family  usually 
have  a  duplicated  DNA/RNA  binding  domain  and  also  duplicated  PTS  regulatory 
domain.  Different  variants  of  this  structure  are  known,  and  additional  domains  are  often 
present.  The  presence  of  PTS  Eli  homology  domains  is  consistent  with  its  being  an 
activator,  since  these  domains  are  lacking  in  antiterminators  [Greenberg,  2002].  Knowing 
the  architecture  of  this  family,  we  searched  the  whole  anthrax  genome  in  order  to  find  all 
similar  regulators.  Among  the  ones  we  found,  apart  from  the  obvious  AtxA  and  AcpA 
proteins,  there  is  a  very  recent  confirmation  of  the  activity  of  the  BXB0060  (pX02-53), 
named  AcpB  [Drysdale,  2004],  Diversity  of  domain  composition  and  subtle  structural 
differences  in  the  group  of  evolutionary  related  anthrax  regulators  are  certainly  elements 
of  a  very  fine  regulation  of  stages  of  infection. 

BXA0166  +  BXA0207 

Both  BXA0166  and  BXA0207  are  members  of  the  ArsR/SmtB  family  of 
metalloregulatory  transcriptional  regulators.  The  vast  majority  of  known  family  members 
are  repressors.  Indeed,  BXA0166  has  been  characterized  as  the  gene  for  repressor  PagR 
[Hoffmaster,  1999].  They  act  on  operons  linked  to  stress-inducing  concentrations  of 
diverse  heavy  metal  ions.  Derepression  results  from  direct  binding  of  metal  ions  by 
ArsR/SmtB  transcription  regulators.  The  founding  members  of  the  family  are  SmtB,  the 
Zn(II)-responsive  repressor  from  Synecchococcus  PCC  7942  [Morby,  1993],  and  ArsR, 
that  acts  as  the  arsenic/antimony-responsive  repressor  of  the  ars  operon  in  Escherichia 
coli  [Wu,  1991].  Another,  less  well  studied,  group  in  the  ArsR/SmtB  family  are  the 
transcriptional  activators,  with  Vibrio  cholerae  HlyU  as  the  founding  member  [Williams, 


1993].  HlyU  is  known  to  upregulate  the  expression  of  hemolysin  and  of  two  hep  genes, 
which  are  coregulated  with  hemolysin  [Williams,  1996],  We  have  conducted  a 
phylogenetic  analysis  of  this  vast  family,  with  a  focus  on  the  evolutionary  history  of 
ArsR/SmtB  proteins  in  bacilli,  notably  in  anthrax,  and  on  the  relation  between  phylogeny 
and  function  (i.e.  repressor  or  activator). 

In  a  phylogeny  of  representative  members  of  the  ArsR/SmtB  family,  the  two 
pXOl  proteins  are  closely  grouped,  with  other  Bacillus  proteins.  This  group  has  very 
long  branches  in  the  tree,  indicative  of  rapid  evolution  of  the  proteins.  The  only  two 
known  activators  (HlyU  and  NolR)  of  the  family  appear  closely  related,  in  a  clade  with 
proteins  of  unknown  function.  These  latter  include  clear  orthologs  of  HlyU  or  of  NolR.  It 
is  thus  reasonable  to  predict  that  these  proteins  form  a  clade  of  transcriptional  activators. 
Interestingly,  this  "activator"  clade  appears  closely  related  to  the  clade  including  both 
pXO  1  proteins  PagR  is  known  to  act  as  a  repressor,  but  in  a  weak  manner  [Hoffmaster, 
1999]  and  is  suspected  of  having  an  activation  function  as  well  [Mignot,  2003].  A  more 
detailed  phylogeny  of  close  homologues  of  the  pXOl  proteins  (Fig.  xxxB)  shows  that 
there  has  been  a  wave  of  gene  duplications  in  the  ancestor  of  B.antracis  and  B.cereus.  All 
seven  of  the  resulting  paralogues  were  retained  in  B.antracis,  including  the  two  which 
were  transferred  to  pXOl,  while  four  were  secondarily  lost  in  B.  cereus.  There  was  an 
independent  duplication  in  B.  thuringiensis.  Interestingly,  these  are  the  only  bacilli 
represented  in  this  clade  of  close  homologues,  all  three  have  duplications  of  the  gene,  and 
all  three  are  pathogens. 

Overall,  the  phylogenetic  analysis  shows  that  both  pXOl  ArsR/SmtB  proteins  are 
closely  related  members  of  a  clade  of  fast  evolving  proteins,  which  have  duplicated 
several  times  in  pathogenic  bacilli,  and  which  are  related  to  the  only  clade  of 
transcriptional  activators  of  the  family. 

BXA0178 

BXA0178  belongs  to  the  AbrB  family  of  “transition  state  regulators.”  AbrB  was 
first  described  in  Bacillus  subtilis  as  an  activator  and  repressor  of  numerous  genes  during 
transitions  in  growth  phase  [Trowsdale,  1978;  Philips,  2002].  Recently,  Saile  and  Koehler 
[Saile,  2002]  showed  that  the  genomic  copy  of  AbrB  in  B.anthracis  regulates  the 
expression  of  three  toxin  genes,  whereas  the  truncated  pXOl  version  (BXA0178)  of 
AbrB  does  not  affect  toxin  gene  expression.  We  can  speculate  then  that  the  truncation 
could  be  crucial  for  BXA0178  function,  or  its  influence  on  pXOl  function  is  not  yet 
understood. 

BXA0180 

According  to  FFAS  analysis,  BXA0180  is  an  N-terminal  part  of  the  lambda 
repressor-like  DNA-binding  domain  superfamily  (a.35.1),  as  classified  by  the  SCOP 
database  (Structural  Classification  of  Proteins)(FFAS:  -12. 200) [Andreeva,  2004],  The 
ORF  is  truncated  after  the  first  half,  and  experiments  are  needed  to  check  whether  a 
shortened  domain  can  exert  any  function. 

BXA0206 

BXA0206  belongs  to  a  large  family  of  Hfq  proteins.  Members  of  this  family  are 
known  to  be  involved  in  various  metabolic  processes,  like  the  regulation  of  iron 


metabolism  [Wachi,  1999;  Masse,  2002],  mRNA  stability  [Vytvytska,  1998], 
stabilization  and  degradation  of  RNAs  [Tsui,  1997;  Takada,  1999].  Hfq  proteins  are 
similar  to  eukaryotic  Sm  proteins  involved  in  RNA  splicing  [Moller,  2002],  The  function 
of  pXOl  version  is  not  known  and  the  RNA  targeted  by  BXA0206  is  not  recognized.  The 
question  remains  whether  BXA0206  acts  on  an  RNA  encoded  by  the  plasmid  itself  or  has 
another  function,  e.g.  acts  on  a  chromosomal  small  RNA  or  disguises  as  the  human  Sm 
protein. 


DISCUSSION 

We  have  uncovered  several  novel  features  of  the  pXOl  plasmid.  We  showed  that  parts 
of  pXOl  are  not  only  related  to  other  bacilli  plasmids,  but  also  to  proteins  from  more 
distant  species.  One  of  the  most  unexpected  findings  was  that  pXOl  possesses  two 
operons  with  homology  to  type  IV  secretion  and  pilus  assembly  systems.  It  is  surprising 
because  the  type  IV  system  is  found  in  Gram-negative  bacteria.  It  remains  to  be  seen 
whether  we  are  dealing  with  a  minimal  set  indispensable  for  the  formation  and  function 
of  secretion,  or  if  it  is  a  remnant,  unfunctional  set  of  proteins,  or  perhaps  if  the  function(s) 
of  these  operons  has  changed  from  the  original  function.  The  discovery  of  type  IV 
secretion  system  can  have  a  significant  impact  on  our  understanding  of  anthrax  virulence. 
If  this  system  is  functional,  an  unknown  pathogenic  delivery  pathway  may  be  important 
in  the  invasion  process. 

The  similarity  to  other  various  bacteria  and  copying  of  parts  of  operons  shows  the 
phylogenetic  kaleidoscope  nature  of  this  megaplasmid.  Apparently,  the  borrowing  of 
diverse  ideas  allowed  the  formation  of  this  killing  agent.  It  is  worth  noting  that  pXOl 
shares  similarity  with  other  pathogenic  bacteria  also  in  regions  not  previously  recognized 
as  a  part  of  the  pXOl  pathogenicity  island  (see  the  operon  preservation  with 
Burkholderia  and  Xanthomonas  in  Results). 

The  discovery  of  previously  unknown  systems  would  not  be  important  if  we  did  not  ask 
questions  about  regulation.  External  signals,  cell  state  or  host-pathogen  interaction 
certainly  trigger  bacterial  response(s),  and  a  couple  of  them  are  already  known  (for 
review  see  [Koehler,  2002]).  All  these  signals  finally  activate  transcription  of  virulence- 
related  genes.  We  were  trying  to  describe  all  possible  regulators  that  we  could  find  using 
more  sensitive  programs  than  BLAST.  Some  of  the  regulatory  proteins  are  known  not  to 
influence  the  toxin  function  (e.g.  the  homologue  of  AbrB),  but  others  may  be  of  interest 
to  researchers  studying  B.anthracis.  Further  experimental  studies  are  needed  to  prove 
whether  the  newly  discovered  factors  regulate  plasmid  genes  or  chromosome  genes. 

We  don’t  know  how  reliable  is  the  presence  of  a  common  motif  for  AtxA-regulated 
genes.  Its  variable  location  throughout  the  putative  promoter  regions  (closer  or  further  to 
the  ATG)  questions  its  reliability.  We  don’t  know,  however,  if  all  the  genes  are  well 
predicted  and  if  there  are  no  not-recognized  ORFs  5’  from  the  ones  that  are  AtxA- 
dependent.  In  this  case,  the  recognized  ANGGAG  sequence  would  directly  precede  the 
operon.  Deletion  experiments  are  needed  to  test  whether  these  cis  elements  have  any 
impact  on  the  function  of  AtxA-regulated  genes. 


Interesting  was  also  the  description  of  the  ArsR/SmtB  family  of  regulators.  Apparently, 
B.anthracis  is  armed  with  all  kinds  of  ArsR  homologues,  however  the  majority  of  them 
are  related  to  the  activators  subfamily,  with  pXOl  homologues  among  others.  The 
functions  of  MarR  and  TetR  regulators  are  also  intriguing. 

There  are  two  striking  features  of  the  whole  plasmid  that  brought  our  special  attention. 
First,  the  presence  of  so  many  DNA  metabolism-related  proteins  (15%).  It  seems  that 
DNA  is  a  central  point  of  the  pXOl’s  function.  Is  this  function  related  with  the 
processing  of  pXOl ,  chromosomal  DNA,  transposons,  or  host  DNA?  We  don’t  know, 
however  not  one  of  these  hypotheses  can  be  excluded  at  the  moment.  The  type  IV 
delivery  system  could  be  an  indication  that  some  of  them  could  have  an  external  function. 
Second,  when  analyzing  the  DNA  and  proteome  of  pXOl  we  realized  how  messy  it  is. 
pXOl  is  full  of  incomplete  and  mutated  ORFs.  There  are  many  traces  of  ancient 
duplications,  some  still  fresh  (strong  homology),  but  some  almost  completely  faded  away 
(homology  barely  recognizable),  and  often  disrupted.  It  also  consists  of  ORFs 
“borrowed”  from  other  species.  It  seems  to  be  the  subject  of  constant  evolutionary 
pressure.  This  plasmid  should  have  a  tag:  “under  construction.” 


METHODS 

DNA  level  analysis 

The  Bacillus  anthracis  strain  A2012  pXOl  plasmid  sequence  was  used  for  analysis 
(accession:  NC_003980)[Read,  2003],  For  the  analysis  of  common  DNA  features  in 
promoter  regions  of  AtxA-dependent  genes  [Bourgogne,  2003],  we  used  the  total  DNA 
sequences  between  the  end  of  a  previous  gene  and  the  ATG  neighbourhood  of  the  AtxA- 
regulated  gene.  We  used  the  5’  regions  of  the  following  genes  from  pXOl  and  pX02 
plasmids:  BXA0019,  BXA0124,  BXA0125,  BXA0137,  BXA0142  (cyaA),  BXA0164 
(pagA),  BXA0172  (lef),  BXB0045,  BXB0060,  BXB0066,  BXB0074,  BXB0084.  We 
used  MEME  and  MITRA  programs  to  search  for  common  motifs  [Bailey,  1994;  Eskin, 
2002], 

Protein  level  analysis 

For  the  analysis  of  the  pXOl  proteome,  we  used  proteins  accessible  with  the  BXAxxxx 
NCBI  numbers,  enforced  with  the  BLASTX  analysis  [Altschul,  1990].  To  analyze  the 
protein  sequences,  we  used  the  following  programs:  BLAST  tools  [Altschul,  1990; 
Altschul,  1997],  SMART  tool  [Letunic,  2002],  Pfam  [Bateman,  2002],  CDD  [Marchler- 
Bauer,  2003),  TMHMM2.0  [Sonnhammer,  1998],  SEED  [],  Radar  [Heger,  2000], 
FFAS03  [Rychlewski,  2000],  Metaserver.pl  [Elofsson,  2003],  Superfamily  [Gough, 
2001].  To  align  sequences  we  used:  T-COFFEE  [Notredame,  2000],  AliBee  [Nikolaev, 
1997],  MultAlin  [Corpet,  1988],  BioEdit  [Hall,  1999].  Phylogenetic  trees  were  estimated 
from  amino  acid  alignments  using  PHYML  (Guindon  and  Gascuel  2003),  a  fast  and 
accurate  Maximum  Likelihood  heuristic,  under  the  JTT  substitution  model  (Jones,  Taylor 
et  al.  1992),  with  a  gamma  distribution  of  rates  between  sites  (eight  categories,  parameter 
alpha  estimated  by  PHYML).  Bootstrap  support  of  branches  was  estimated  using  the 
programs  SEQBOOT  and  CONSENSE  of  the  PHYLIP  package  (Felsenstein  2002)  with 
1000  replicates;  the  parameter  alpha  was  estimated  independantly  for  each  repetition. 
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8We  have  identified  a  new  domain  in  a  broad  range  of  bacterial,  as  well  as  single  archaeal  and  plant  proteins.  Its  presence  in 
9the  virulence-related  pXOI  plasmid  of  Bacillus  anthracis  as  well  as  in  several  other  pathogens  makes  it  a  possible  drug  target. 
lOWe  term  the  new  domain  nuclease-related  domain  (NERD)  because  of  its  distant  similarity  to  endonucleases. 


11  Anthrax,  a  disease  of  herbivores  and  primates  (including 
12humans),  is  caused  by  a  gram-positive,  spore-forming 
13bacterium,  Bacillus  anthracis.  The  virulence  of  this 
14bacterium  is  dependent  on  two  megaplasmids:  pXOI, 
15 which  is  required  for  the  synthesis  of  the  toxin  protein  [1]; 
16and  pX02,  which  is  required  for  the  synthesis  of  an  anti- 
17phagocytic  capsule  [2-4].  Strains  lacking  either  of  the  two 
18megaplasmids  are  avirulent. 

19  The  pXOI  plasmid  has  been  analyzed  in  several  recent 
20genome  sequence  studies  [5-7],  by  using  standard  tools 
21  such  as  BLAST.  Using  sensitive  homology-detection 
22algorithms,  we  have  found  that  a  1 17-amino  acid  fragment 
23of  the  pXOl-OI  protein,  previously  annotated  as  a 
24hypothetical  protein,  defines  a  new  domain  that  is  shared 
25by  multiple  proteins  in  other  eubacteria  and  is  also 
26present  in  small  numbers  in  archaea  and  plant  proteins. 
27We  call  it  NERD  for  nuclease-related  domain. 

28The  NERD  domain 

29 Starting  from  the  amino  acid  sequence  of  the  B.  anthracis 
30pX01-01  protein,  a  cascade  of  PSI-BLAST  searches  [8] 
31  identified  >40  proteins  with  a  region  displaying 
32statistically  significant  sequence  similarity  to  the  seed 
33protein  and  to  each  other  (Figure  1)  and  with  varied 
34domain  combinations  (Figure  2).  The  NERD  domain  partly 
35overlaps  two  Pfam-B  domains  -  Pfam-B_22501  and  Pfam- 
36B_26882  [9].  However,  the  Pfam-B  families  contain  only  a 
37 few  sequences  (5  and  4,  respectively)  with  single  domain 
38context  each.  An  alignment  of  NERD  is  presented  in 
39Figure  1  and  covers  117  amino  acids. 

40  The  NERD  sequence  is  characterized  by  three 

41  conserved  regions  interspersed  among  weakly  conserved 
42or  very  diverse  regions  (Figure  1).  Conserved  hydrophobic, 
43mainly  aliphatic  motifs  (consisting  of  Leu,  lie  and  Val)  and 
44polar,  mainly  charged  positions  (e.g.  Asp,  His,  Glu  and 
45Lys),  alternate  in  the  alignment.  The  first  and  most 
46 conserved  region  is  formed  by  the  N-terminal  Glu  followed 
47by  the  [Gln/Glu]-[Ile/Val/Leu]-Asp  motif,  then  a  stretch  of 
48 hydrophobic  residues  with  two  polar  (Glu  and  Lys)  and 
49two  hydrophobic  (Gly  and  [Ile/Leu/Val])  residues  at  the 
50end.  The  next  20  amino  acids  are  not  conserved,  but  the 
51  [Ser/Asn]-Pro-[Ile/Leu/Val/Met]  motif  with  a  neighboring 
52GIn  form  a  second  conserved  region.  The  third  is  at  the  C- 
53terminal  25  amino  acids,  with  mainly  the  hydrophobic 


54amino  acids  conserved.  An  interesting  feature  of  NERD  is 
55the  existence  of  subgroups  that  have  no  conservation  in 
56motifs  that  are  conserved  in  all  other  members  of  the 
57family  (e.g.  two  N-terminal  glycine  residues  are  missing  in 
58the  plant  domain)  or  with  a  charge  difference  (e.g.  Glu 
59instead  of  Gin  in  the  most  conserved  [Gln/Glu]- 
60[Ile/VaI/Leu]-Asp  motif).  We  can  only  hypothesize  that 
61  these  differences  account  for  functional  diversity  within 
62the  NERD  family. 

63  The  predicted  a-p~p-p~p-(weak  p/long  loop)-oc-p-p 
64secondary  structure  of  NERD  domain  helps  rationalize  the 
65conservation  of  specific  regions  of  the  domain  (Figure  1) 
66because  all  the  conserved  residues  coincide  with 
67secondary-structure  elements,  especially  the  third  and 
68fourth  p  strands.  The  only  exception  is  the  fifth  p  strand, 
69which  is  likely  to  be  a  terminal  strand  or  a  long  loop 
70(Figure  1). 

71  NERD-domain  associations 

72The  majority  of  NERD-containing  proteins  are  single- 
73domain,  in  several  cases  with  additional  (predicted) 
74transmembrane  helices.  In  only  a  few  instances,  proteins 
75containing  NERD  have  additional  domains  that,  in  75%  of 
76these  cases,  are  involved  in  DNA  processing.  In  all  cases 
77in  which  NERD  is  present  in  multidomain  proteins,  it  is 
78found  at  the  N  terminus.  There  is  also  no  evident  operon 
79conservation  for  NERD-containing  proteins  and  no 
80apparent  connection  between  phyla  and  domain  fusions. 

81  Most  NERD-containing  proteins,  including  the  group- 
82deflning  B.  anthracis  pXOl-01  protein,  consist  entirely  of 
83the  NERD  domain,  sometimes  with  short  tails  of  several 
84amino  acids  on  both  C  and  N  termini.  All  proteins  in  this 
85group  are  hypothetical  open  reading  frames  (ORFs).  In 
86addition,  in  several  proteins  the  NERD  domain  is 
87associated  with  one  or  two  predicted  transmembrane 
88motifs,  which  could  be  located  either  at  the  N  or  C 
89terminus  (Figure  2). 

90  In  a  hypothetical  Clostridium  perfringens  protein  (gi: 

91  18309656),  the  NERD  domain  is  followed  by  the  helicase 
92and  RNaseD  C-terminal  (HDRC)  domain  (PF00570; 
93Figure2).  HRDC  is  an  80-amino  acid  protein  domain 
94usually  found  at  the  C  terminus  of  RecQ  helicases  and 
95RNase  D  homologs  from  various  organisms,  including 
96human  [10].  An  HRDC  domain  is  present  in  genes  linked 


1  to  the  human  diseases  Werner  and  Bloom  syndromes 
2 [  1 1,12].  The  HRDC  domain  is  involved  in  the  binding  of 
3DNA  to  specific  DNA  structures  (e.g.  long-forked  duplexes 
4and  Holliday  junctions)  that  are  formed  during  replication, 
5recombination  or  transcription  [13].  Interestingly,  in  the 
6many  HRDC-containing  proteins,  the  N-terminal  legion  in 
7the  3'-»5'  exonuclease  domain  (PF01612)  that  is 
8responsibIe  for  the  3'-»5'  exonuclease  proofreading  activity 
9of  the  DNA  polymerase  I  and  other  enzymes  and  catalyzes 
lOthe  hydrolysis  of  unpaired  or  mismatched  nucleotides 
11  [14,15].  One  can  speculate  that  NERD,  existing  in 
12analogous  arrangement  with  the  HRDC  domain,  has  a 
13related  function. 

14  In  at  least  three  proteins,  including  the  hypothetical 
ISprotein  (gi:  22972752)  from  Chloroflexus  aurantiacus,  the 
16NERD  domain  is  found  at  the  N  terminus  of  the  UvrD/Rep 
1 73'— >5'  DNA  helicases  (PF00580),  which  catalyze  the  ATP- 
18dependent  unwinding  of  double-stranded  to  single- 
19stranded  DNA  (ssDNA)  [16].  DNA  helicases  are  essential 
20for  processes  such  as  DNA  replication,  recombination  and 
21  repair  [17].  This  domain  co-occurs  with  the  HRDC  domain 
22in  several  bacterial  species  (i.e.  Streptomyces  coelicolor, 

23  Coryncbacterium  glutamicum,  Mycobacterium  leprae  and 

24  Mycobacterium  tuberculosis) . 

25  In  two  proteins,  in  Pseudomonas  aeruginosa  (gi: 
264406504)  and  the  Bacteroides  (gi:  8308027),  NERD  is 
27folIowed  by  the  DNA-binding  C4  zinc  finger  (PF01396), 
28which  is  a  short  motif  present  in  two  NERD  proteins 
29(Figure  2),  usually  a  C-terminal  region  of  prokaryotic 
30topoisomerases  I  [18].  The  role  of  topoisomerase  in  the 
31  bacterial  cell  is  to  remove  excessive  negative  supercoils 
32from  DNA  to  maintain  the  optimal  superhelical  state  [19]. 
33The  zinc  motifs  do  not  cleave  or  recognize  the 


61  Bacteroides  unit  1  (NBU1)  ,  a  10.3-kbp  integrated  element 
62that  can  be  excised  and  mobilized  in  trans  by  tetracycline- 
63inducible  Bacteroides  conjugative  transposons  [22,23].  The 
64elements  responsible  for  integration  and  excision  were 
65recognized  [24-26],  but  orf8  is  probably  not  involved  in 
66these  processes.  The  large  G+C  content  difference  between 
67 or f6,  orf7  and  orfS  (35%),  and  other  Bacteroides  genes 
68(42%)  suggests  a  possible  recent  acquisition  that  is 
69involved  in  a  yet-undiscovered  transposition  process.  The 
70presence  of  NERD  in  a  unique  archaeal  and  only  two  plant 
71  species  supports  such  a  transposon-type  transfer  of  the 
72domain. 

73  A  more  detailed  prediction  can  be  made  based  on  the 
74domain  structure  similarity  between  NERD  proteins  that 
75contain  the  HRDC  domains  and  the  N-terminal  region  of 
76exonucIease  proteins  that  contain  the  HRDC  domains  BK2 

77iisiiiii 

78further  supported  by  distant  homology  between  NERD 
79and  the  COG0792  family,  a  predicted  endonuclease  family 
80distantly  related  to  archaeal  Holliday  junction  resolvase, 
81  members  of  which  are  involved  in  DNA  replication  and/or 
82recombination,  and/or  repair.  This  homology  is  predicted 
83 by  a  profile-profile  search  algorithm  FFAS  (fold  and 
84function  assignment  system)  [27],  albeit  with  low 
85statistical  significance.  Several  fold-recognition  algorithms 
86(e.g.  Superfamily  and  BASIC)  [27-29]  identify  matches  to 
87the  Holliday  junction  resolvase  structure  (PDB  codes: 
88lgefA  and  IhhlA)  with  statistically  significant  scores 
89 [30,31],  The  alignment  between  the  NERD  and  COG0792 
90families  and  the  sequence  of  the  Holiday  junction 
91  resolvase  (PDB  code:  IgefA)  is  shown  in  Figure  1  (both 
92alignments  were  obtained  by  the  FFAS  [27]  algorithm). 
93The  alignment  covers  only  the  N-terminal  half  of  NERD, 


34topoisomerase  substrate,  rather,  they  are  believed  to  94and  the  3D  model  of  this  is  shown  in  Figure  3. 
35interact  with  ssDNA  to  relax  negatively  supercoiled  DNA  95Interestingly,  all  active-site  residues  of  resolvase  (black 
36 [20].  Apart  from  topoisomerases,  there  are  a  few  proteins  96arrows  in  the  alignment  and  residues  shown  in  atomic 
37 with  proximally  located  restriction  endonucleases  97detail  in  the  Figure  3)  are  conserved  in  most  NERD  family 
38 (PF04471)  or  unknown  N  termini  that  possess  the  C4  zinc  98members,  which  strongly  supports  the  functional 
39fingers.  However,  their  role  is  unknown.  99prediction.  The  common  denominator  of  all  these 

40  In  five  proteins,  the  NERD  domain  is  followed  by  two  lOOpredictions  suggests  a  nuclease  function  for  NERD. 


41STYKc  domains  (PF00069).  STYKcs  are  protein  kinases 
42with  possible  dual  serine,  threonine  and  tyrosine  kinase 
43specificity  [21],  For  example,  in  the  cases  of 
44  Thermomonospora  fusca  and  Streptomyces  coelicolor ,  there 
45are  genomic  associations  with  DNA  polymerase  III  and 
46transposase,  and  an  adenine-specific  methyltransferase, 
47respectively,  which  can  suggest  a  nucleotide- related 
48function  of  these  large  proteins  (ERGO  database: 
49http://ergo.integratedgenomics.com). 

50  In  most  cases,  only  one  copy  of  the  NERD  domain  is 

51  present  in  a  given  organism.  We  found  that  in  only  three 
52bacteria  there  are  two  copies  of  NERD  per  genome  (in 
53 Burkholderia  fungorum,  Oceanobacillus  iheyensis  and 
54 Desulfitobacterium  hafniense) . 


101  Concluding  remarks 

102  We  have  discovered  a  novel  domain,  NERD,  with  predicted 
103connection  to  DNA  processing.  Genomic  context  analysis 
104and  distant  homology  analysis  suggest  a  nuclease 
105function. 

106  The  finding  of  this  domain  is  important  for  the 
107understanding  of  anthrax  virulence.  The  location  of 
108pXOl-01  in  the  vicinity  of  other  DNA  processing-related 
109ORFs,  on  the  anthrax  virulence  plasmid,  suggests  an 
1 1 0orchestrated  function  of  the  products  of  these  genes.  Is 
111  this  machinery  an  anthrax  DNA-remodeling  system  or  is 
1 1 2it  involved  in  the  eukaryotic  cell  attack?  Maybe  further 
113advances  in  the  studies  of  the  NBU1  element  will  reveal 
1 14its  function. 


55pXO1-01  function  115  The  presence  of  NERD  in  only  few  non-bacterial  species 

56None  of  the  NERD-containing  proteins  have  been  studied  1 1 6not  only  suggests  that  this  domain  might  be  involved  in 
57by  experiment,  therefore,  its  exact  function  is  not  known.  117some  mobility  processes,  but  also  that  the  species  transfer 


58However,  bioinformatics  analyses  offer  some  clues. 

59  The  closest  homolog  of  pXOl-01  is  the  orf8  protein  from 
60 Bacteroides  spp.  It  is  an  ORF  from  the  non-replicating 


1 1 8must  have  happened  quite  recently. 
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2Figure  1.  An  alignment  of  a  sample  set  of  NERD  (nuclease-related  domain)  sequences.  The  alignment  was  generated  using  AliBee 
3(http://www.genebee. msu.su/services/malignjull.html)  (32]  and  colored  in  BioEdit  (33].  PSI-BLAST  (8)  searches  of  the  nonredundant  protein  database  using  the 
A  Bacillus  anthracis  pXO1-01  protein  (gi:  10956248)  as  query  were  performed  using  the  default  parameters.  After  five  rounds  of  searching,  representatives  of  all 
5subgroups  of  NERD  were  found.  The  highest  E  value  was  2e-06.  The  uppermost  group  is  composed  of  prokaryotic  proteins,  the  middle  protein  is  the  sole  example  of 
6an  archaeal  NERD-containing  protein  and  the  lowermost  are  two  plant  proteins.  The  shading  threshold  is  40%.  The  alignment  is  colored  according  to  identity  and 
7similarity  according  to  the  default  BioEdit  amino  acid  similarity  scoring  matrix.  The  secondary-structure  prediction  is  given  for  pXO1-01  as  a  combined  result  of 
8PSIPRED  (34],  Sam-T99-2d  (35]  and  Profsec  (36]  at  the  MetaServer  (http://bioinfo.pl/Meta/)  [28].  The  results  for  other  members  of  the  NERD  family  are  almost 
9identical.  Arrows  indicate  the  conserved  residues  that  are  important  for  the  endonuclease  activity  of  resolvases.  Two  shorter  sequences  ( Nitrosomonas  europaea 
10Q82W50  and  Pseudomonas  aeruginosa  Q9I5W3)  are  coded  by  genomic  sequences  with  a  stop  codon  and  no  sequence  homology  beyond  the  stop  codon,  when 
1  1  checked  using  the  BLASTX  program  (37,38]  at  the  National  Center  for  Biotechnology  Information  (http://www.ncbi.nlm.nih.gov/BLAST/).  It  seems  that  the  N-terminal 
12catalytic  domain  is  sufficient  for  their  function.  Sequences  shown  are  (species  name,  gi  number,  in  brackets  are  the  first  and  last  positions  in  the  sequences  aligned): 
13Banthracis  (pXOl)10956248,  Bacillus  anthracis  Q8KYT4  (29-146);  Buniformis8308027,  Bacteroides  uniformis  Q9KIA1  (24-141);  Presinovorans27228636,  Pseudomonas 
AAresinovorans  Q8GHQ8;  Banthracis21397560,  Bacillus  anthracis  Q81XB5  (41-162);  Ttengcongensis20807162,  Thermoanaerobacter  tengcongensis  Q8RBY3  (62-186); 
1 5Mpulmonis15828796,  Mycoplasma  pulmonis  Q98QN6  (59-177);  Vcholerae15640829,  Vibrio  cbolerae  Q9KTS7  (18-141);  Cperfringens18309656,  Clostridium 
1  6 perfringens  Q8XML4  (55-181);  Paeruginosa4406504,  Pseudomonas  aeruginosa  AAD20003  (58-188);  0iheyensis23100758,  Oceanobacillus  iheyensis  Q8ELC6  (37-156); 
1  7Dhafniense231 18062,  Desulfitobacter/'um  hafniense  ZP_00101791  (39-170);  Bfungorum22982387,  Burkholderia  fungorum  ZP_00027654  (41-167); 

1  8Dradtodurans15806760,  Deinococcus  radiodurans  Q9RTK3  (18-135);  Oiheyensis23098248,  Oceanobacillus  iheyensis  Q8ES50  (37-150);  Neuropaea30248850, 
1  9  Nitrosomonas  europaea  Q82W50  (102-194);  Paeruginosa15595770,  Pseudomonas  aeruginosa  Q9I5W3  (33-111);  Dhafniense231 11400,  DesuWtobacterium  hafniense 
20zP_00097061  (69-184);  Scoelicolor21 224924,  Streptomyces  coelicofor  086560  (12-134);  Soneidensis24373036,  Shewanefta  oneidensis  Q8EGX7  (10-130); 
21  Tfusca23019041,  Thermobifida  fusca  ZP_00058754  (109-224);  Styphi10957304,  Salmonella  /ypWQ9L5M7  (31-150);  Tfusca23019341,  Thermobifida  fascaZP_00059052 
22(109-224);  Soneidensis24372091,  Shewanella  oneidensis  Q8EJH0  (9-121);  Oiheyensis23099558,  Oceanobacillus  iheyensis  Q8EPJ8  (106-223); 
23Mmagnetotacticum23013346,  Magnetospirillum  magnetotacticum  [  16-134);  Mthermautotrophicusl  5678494,  Methanothermobacter  thermautotrophicus  026566  (128- 
24244);  Athalianal 5220924,  Arabidopsis  thaliana  Q9SS58  (30-153);  Osativa18266637,  Oryza  sativa  Q8W3G9  (35-152).  This  multiple  sequence  alignment  (alignment 
25number  ALIGN_000650)  has  been  deposited  with  the  European  Bioinformatics  Institute  (ftp://ftp.ebi.ac.uk/pub/databases/embl/align/ALIGN_000650). 


Key: 


2Flguro  2.  The  domain  architecture  of  NERD  (nuclease-related  domain)-containing  proteins.  In  all  cases  of  multidomain  proteins,  NERD  is  located  in  the  N  terminus.  All 
3domains  were  recognized  using  the  simple  modular  architecture  research  tool  (SMART)  server  (http://smart.embl-heidelberg.de/or  http://smart.ox.ac.uk/)  139).  In  case 
4of  long  proteins,  the  size  of  domains  is  not  proportional  to  protein  length. 


5 

6pi0ure  3.  The  predicted  structure  of  NERD  (nuclease-related  domain).  The  pX01-01  model  was  obtained  with  the  Modeller  comparative  modelling  suite  [40],  on  the 
7 basis  of  the  FFAS  (fold  and  function  assignment  system)  (27)  alignment.  The  ribbon  diagram  was  prepared  using  Pymol  [41 ). 

8 


Appendix  3 


Discovery,  crystal  structures  and  characterization  of  a  putative  COz  sensor  domain, 
“BACO”:  In  computational  searches  of  the  sequences  of  the  pXOl  and  pX02  plasmids, 
we  noticed  an  ORF  (number  118  in  pXOl)  contained  within  the  pXOl  pathogenicity 
island  that  was  previously  poorly  characterized.  By  searching  GenBank  with  the  amino 
acid  sequence  of  the  pXOl-118  protein  sequence,  using  the  BLAST  and  PSI-BLAST 
programs  with  default  values,  we  identified  a  homologue  from  the  B.  anthracis  pX02 
plasmid  (protein  pX02-61)  with  e-value  =  4e-35  in  the  first  iteration.  The  arrangement 
of  these  genes  in  both  pXOl  and  pX02  is  striking  similar.  In  both  cases,  the  gene  is  next 
to  the  activator  (AtxA  or  AcpA)  and  transcribed  in  the  opposite  direction.  Recently,  the 
Koehler  group  (Bourgogne  et  al..  Infect.  Immun.  71 :  2736-2743  (2003))  have  shown  that 
AtxA  upregulates  gene  expression  on  both  plasmids;  the  most  upregulated  gene  pn  the 
pX02  plasmid  is  pX02-61 ,  suggesting  that  it  plays  an  imprtant  role  in  virulence. 
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Figure:  A  new  sensory  domain ,  BACO,  found  in  several  Bacillus  species.  In  all  cases  except 
pXOl  and  pX02 ,  it  is  a  part  of  a  sensor  histidine  kinase  protein  implicated  in  sporulation.  The 
presence  of  this  domain  on  both  virulence  pXOl  and  capsule  pX02  plasmids  of  Bacillus  anthracis 
suggests  its  involvement  in  virulence .  In  the  lower  panel,  the  gene  organization  in  the 


neighborhoods  of  pXOl-1 18  and  pX02-61  is  seen  to  be  strikingly  similar,  adjacent  to  their 
putatuve  response  regulators. 


We  found  a  second  homolog  in  the  B.  anthracis  chromosome  (RBAT07138/A2012).  In 
this  case,  the  protein  has  the  organization  typical  of  a  sensor  histidine  kinase.  The 
sequence  of  the  kinase  domain  suggests  that  it  phosphorylates  SpoOF  in  the  phosphorelay 
system  that  triggers  sporulation.  Three  other  homologs  were  found  in  Bacillus  species 
(one  each  in  B.  stearothermophilus ,  B .  thuringensis  and  B.  cereus).  The  homology  with 
pXOl-1 1 8  protein  reached  an  e-value  =  2e-18  after  the  first  round  of  searching.  The  last 
similar  protein  was  discovered  in  the  genome  of  B.  stearothermophilus .  It  is  also  a 
histidine  kinase;  but  is  more  distantly  related  (e-value  =  2e-l  1). 
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Figure:  Sequences  of  B.  Subtilis  sporulation  sensor  histidine  kinases  around  the 
phosphohistidine  site,  and  B.  anthracis  kinases  implicated  in  sporulation  by  sequence  similarity. 
In  red,  the  new  member  of  this  family,  containing  a  BACO  sensor  domain. 


C.2b  Crystal  Structure  of  pXOl-118/BACO-l:  We  expressed  BACO-1  in  E.  coli  as  a 
His-tag  protein,  cleaved  with  thrombin  and  purified  using  a  Ni  affinity  column  and  a 
Superdex75  gel  filtration  column.  The  protein  runs  on  SDS-PAGE  as  expected  with  a 
M.W.  of  ~17  kDa  and  runs  as  a  dimer  on  a  sizing  column.  Crystals  with  typical 
dimensions  0.1  mm  x  0.05  mm  x  0.05  mm  were  grown  in  space  group  P3221.  Using 
synchrotron  radiation  native  and  SeMet-MAD  data  sets  to  2.5  A  were  collected.  With 
phase  information  from  the  SeMet-MAD  data  improved  by  solvent  flattening  and  phase 
extension  to  1.85  A  an  interpretable  electron  density  map  was  obtained.  The  asymmetric 
unit  contains  one  molecule;  the  molecular  two-fold  axis  coincides  with  a  crystallographic 
dyad.  Model  building  and  refinement  were  carried  out  in  programs  O  and  CNS.  The  final 
BACO-1  model  consists  of  residues  2-147,  with  Rfree  =  25  %  and  appropriate 
stereochemistry.  The  BACO-1  structure  reveals  a  helix  bundle  with  5  helices  (see 
Figure). 
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Figure:  Crystal  structure  ofpXOl-118  dimer  at  1.8  A  resolution.  The  co-factors  are  shown  in 
red.  Figure  16:  Electron  density  for  the  fatty  acid  ( red  stick)),  perhaps  myristic  acid,  bound  in  the 
hydrophobic  core  of  each  monomer.  (Right  panel)  Close-up  of  the  electron  density  near  Arg  73, 
showing  strong  density  connected  to  a  fatty  acid  at  right,  and  an  unknown  additional  density  ( red 
circle ). 


The  fold  is  most  closely  related  to  the  globin  fold,  which  defines  a  family  of  proteins  that 
typically  bind  co-factors  in  a  hydrophobic  cavity  at  the  center  of  the  bundle.  There  is 
strong  electron  density  at  this  position  that  appears  to  be  a  fatty  acid  (perhaps  myristic  or 
oleic  acid)  with  at  least  13  C-C  units.  One  end  of  this  additional  density  lies  next  to  a 
buried  arginine  (R73),  which  is  part  of  a  motif  (KIAxER)  that  is  invariant  within  this 
small  family  of  sensor  domains.  On  the  other  side  of  the  arginine  is  a  strong  electron 
density  feature  that  may  be  covalently  bonded,  which  may  be  a  bound  anion  (it  is  of 
course  a  possibility  that  this  represents  bound  C02,  since  C02  was  not  excluded  during 
crystal  growth).  We  are  currently  trying  to  identify  the  co-factor  using  mass 
spectrometry  and  NMR. 


Figure:  Comparison  of  BACO-1  (left)  and  the  oxygen-sensing  domain  from  Bacillus  subtilis 
(right). 

Very  recently,  the  structure  of  an  oxygen  sensor  from  Bacillus  subtilis  has  been 
determined.  It  has  a  related  fold  to  BACO-1 ,  and  binds  heme  in  a  location  similar  to  that 
of  the  BACO-1  co-factor  (BACO-1  has  no  equivalent  of  the  heme-linked  histidine,  so 


cannot  be  a  heme-binding  protein).  The  oxygen  sensor  and  BACO-1  can  be 
superimposed  with  an  RMSD  of  1 .8  A  for  70  Ca  atoms),  although  there  are  different 
helical  extensions  at  the  N-  and  C-termini,  and  the  helices  pack  distinctly  around  the 
larger  heme  co-factor.  Thus  the  BACO  fold  is  a  subfamily  within  a  larger  family  of 
globin-like  sensor  domains. 

C.2.d  NMR  evaluation  of  BACO-1  as  a  CO,  sensor  .We  studied  the  effect  on  C02  on 
pXOl-118  using  ID  'H  NMR.  The  aliphatic  region  of  the  spectra  are  shown  in  Figure  19 
(upper  panel).  The  protein  resonances  appear  quite  broad,  suggestive  of  a  large  system. 
Nevertheless,  the  spectrum  changes  significantly  upon  binding  C02,  and  a  close-up  of  the 
amide  region  (Figure:  lower  panel)  spectra  clearly  shows  new  peaks  appearing  and  old 
disappearing  when  C02  is  added  to  the  system.  Although  these  data  are  preliminary,  they 
suggest  a  striking  effect  on  C02  binding  to  the  protein,  and  are  consistent  with  a 
conformational  change  in  the  protein. 


Figure:  ID  ‘ H  NMR  spectra  of  recombinant  118  in  presence  (red)  and  absence  (blue)  of  C02. 
The  aliphatic  region  of  the  spectra  are  shown  in  upper  panel.  Zoom-in  of  the  amide  region 
(lower  panel)  shows  new  peaks  appearing  and  old  peaks  disappearing  (black,  apo;  red  is  +  5  mM 
Na2C03;  the  pH  did  not  change  (6.0,  buffered  by  50  mM  KPi). 

C.2.e  Cloning.  Expression  and  crystallization  of  BACO-2/pXQ2-61:  The  gene 
encoding  pX02-61/BACO-2  was  synthesized  by  GenScript  Corporation  and  cloned  into 
a  pET-28a  vector  (Novagen).  Following  cleavage  of  the  His-tag  with  thrombin,  the 
solution  was  further  purified  using  a  Superdex75  gel  filtration  column.  The  protein  runs 
on  SDS-PAGE  as  expected  with  a  M.W.  of  ~16.4  kDa.  On  a  sizing  column,  the 


estimated  M.W.  is  ~17  kDa,  suggestive  of  a  monomer  in  solution.  This  behavior 
contrasts  with  that  of  BACO-1 ,  which  runs  as  a  dimer. 


Figure:  pX02-61/BAC02  crystals  grown  from  100  mM  Tris-HCl  pH  7.0,  5%  (w/v)  PEG-1000. 
(Right  panel)  Diffraction  pattern  from  a  capillary-mounted  crystal 


Purified  BACO-2  was  crystallized  by  microbatch  under  paraffin  oil.  Orthorhombic 
crystals  were  grown  from  1M  Nal  solution  within  2  days.  The  space  group  is  P2,2,2, 
with  unit  cell  parameters  a=44  A  b=62  A  c=124  A.  One  high  and  one  low  resolution  data 
set  were  measured.  They  have  been  scaled  together  and  a  molecular  replacement  run 
with  pXOl -1 1 8  as  model  has  been  done.  Using  synchrotron  radiation,  data  sets  up  to  1 .5 
A  have  been  collected.  The  asymmetric  unit  contains  two  molecules.  Model  building 
and  refinement  was  carried  out  in  programs  O  and  CNS.  The  current  BACO-2  model 
consists  of  residues  3-136,  with  Rfree=27  %  and  appropriate  stereochemistry.  As 
expected,  the  BACO-2  structure  is  very  simialr  to  that  of  BACO-1 ,  although  it  lacks 
continuous  density  for  a  co-factor. 


Appendix  4 


•ystal  structure  of  B.  anthracis  amidase  homologous  to  a  bacteriophage  lysin 

icteriophage  lysin  is  a  class  of  protein  enzyme  that  is  used  by  phage  to  break  open  its 
cterial  host  in  order  to  release  its  progeny  particles.  Lysin  is  an  amidase  that  targets 
d  breaks  down  peptidoglycan,  an  important  cell  wall  cross-linking  component  in 
cteria.  Recently,  the  lysin  (plyG)  from  the  gamma  phage  of  the  Bacillus  anthracis  has 
en  isolated  and  proved  to  be  lethal  to  the  hosts  when  applied  to  bacteria  culture  as 
rified  protein  (Schuch  R,  Nelson  D  and  Fischetti  VA  (2002).  A  bacteriolytic  agent  that 
tects  and  kills  Bacillus  anthracis.  Nature  418  884-889).  This  discovery  opened  a  new 
ly  in  which  anthrax  could  be  treated  and/or  detected. 

:nomic  sequence  analysis  of  the  B.  anthracis  revealed  that  there  is  a  gene  (N- 
stylmuramoyl-L-alanine  amidase;  EC  3.5.1 .28)  with  high  sequence  homology  with  the 
/G  (82%,  see  the  above  sequence  alignment).  There  are  only  a  few  amino  acid 
Terences  in  the  N-terminal  region  160  amino  acids  (93%  identities)  consisting  of  the 
talytic  amino  acids  (catalytic  domain).  The  catalytic  Tyr  and  Lys  are  absolutely 
nserved,  which  might  suggest  that  the  two  proteins  could  have  very  similar  catalytic 
:chanism.  The  amino  acid  sequence  differences  appear  mainly  in  the  C-terminal  region, 
lich  is  thought  to  be  a  bacterial  cell-wall  carbohydrates  binding  domain.  It  is  not  known 
lether  the  differences  in  the  binding  domain  would  suggest  a  different  binding  site  on 
i  bacterial  cell  wall.  But  the  highly  homologous  catalytic  domain  may  imply  that  they 
iginated  from  a  single  source  in  a  recent  time. 

quence  comparison  with  the  gamma  phage  plvG 

)re  =  400  bits  (1028),  Expect  =  e-11 1  Identities  =  194/234  (82%),  Positives  =  213/234  (91%),  Gaps  = 
34  (0%) 

yG  1  MEIQKKLVDPSKYGTKCPYTMKPKYITVHNTYNDAPAENEVSYMISNNNEVSFHIAVDDK  60 
MEI+KKLV  PSKYGTKCPYTMKPKYITVHNTYNDAPAENEV+YMI+NNNEVSFH+AVDDK 
_ami  1  MEIRKKLVVPSKYGTKCPYTMKPKYITVHNTYNDAPAENEVNYMITNNNEVSFHVAVDDK  60 

yG  61  KAIQGIPLERNAWACGDGNGSGNRQSISVEICYSKSGGDRYYKAEDNAVDWRQLMSMYN  120 
+AIQGIP  ERNAWACGDGNG  GNR+SISVEICYSKSGGDRYYKAE+NAVDWRQLMSMYN 
_ami  6 1  QAIQGIPWERNAWACGDGNGPGNRESISVEICYSKSGGDRYYKAENNAVDWRQLMSMYN  120 

yG  121  IPIENVRTHQSWSGKYCPHRMLAEGRWGAFIQKVKNGNVATTSPT-KQNIIQSGAFSPYE  179 
I P IENVRTHQS WSGKYCPHRMLAEGRWGAF I QKVK+GNVA+  +  T  KQNIIQ+GAFSPYE 
_ami  121  IPIENVRTHQSWSGKYCPHRMLAEGRWGAFIQKVKSGNVASATVTPKQNIIQTGAFSPYE  180 

yG  180  TPDVMGALTSLKMTADFILQSDGLTYFISKPTSDAQLKAMKEYLDRKGWWYEVK  233 
PD  +GAL  SL  MT  1+  +GLTY  ++  PTSD  QL+A  KEYL+RK  WWY+  K 
_ami  181  LPDAVGALKSLNMTGKAI XNPEGLT Y I VTDPTSDVQLQAFKEYLERKDWWYDDK  234 


order  to  explore  the  possibility  of  using  the  amidase  as  a  defense  or  treatment  against 
thrax  attack,  the  mechanism  of  the  enzyme  must  be  studied  in  detail.  We  are  currently 
llaborating  with  Dr.  Philip  Hanna  to  test  the  bacteriocidal  effects  of  the  amidase.  The 
romosomal  copy  of  a  class  II  amidase,  consisting  of  234  amino  acids  (NP_657904), 


is  cloned  into  the  pET22  expression  vector.  Expression  of  protein  was  performed  using 
.21DE3  using  standard  protocol.  The  protein  was  purified  using  HITRAP  SP- 
oharose  (Amersham  Pharmacia)  and  Sephacryl  200  HR  column  (Amersham 
armada)  with  zinc  containing  buffers.  The  DNA  sequence  and  molecular  weight  were 
rified  by  automated  DNA  sequencing  and  MALD1  mass  spectrometry. 

le  full  length  protein  was  only  soluble  up  to  concentration  of  5  mg/ml,  but  did  not 
/stallize  under  any  of  576  conditions  tested.  Limited  proteolysis  using  elastase  was 
;n  used  to  produce  a  smaller  fragment.  Mass  spectrometry  and  peptide  sequencing 
owed  that  the  final  product  consisted  of  the  first  159  amino  acids.  This  fragment 
ntains  the  conserved  catalytic  domain.  The  proteolytic  reaction  was  scaled-up,  and 
xluct  used  for  crystallization.  The  catalytic  domain  was  crystallized  in  1.2  M 
tH2PO4/0.8  M  K2HP04  0.1  M  TrisCl  pH  7.0.  The  crystals  belong  to  spacegroup  P61( 
th  cell  dimensions  a=  164.1  A,  c=  37.6  A.  Phases  were  generated  using  a 
lenomethionine  (SeMet)  derivative,  which  was  produce  in  M9  minimal  medium 
pplemented  with  vitamins,  glucose,  ammonium  sulfate  and  seleno-L-methionine.  The 
tstase-digested  product  was  then  crystallized  under  conditions  identical  to  the  native 
Dtein. 

ultiple  anomalous  diffraction  experiments  of  the  SeMet  protein  were  carried  out  at  the 
inford  synchrotron  radiation  laboratory  (SSRL).  Three  wavelengths  were  used:  peak  , 
>h  energy  remote,  and  inflection  at  0.9792  A,  0.8919  A  and  0.9794  A,  respectively, 
le  SeMet-amidase  diffracted  to  a  resolution  of  2.0  A  (94%  completeness,  R-factor  of 
%).  Three  protein  molecules  were  found  in  the  asymmetric  unit.  Rebuildingand 
'inement  were  done  using  O  and  CNS.  The  current  R-value  is  0.3067  with  Rfree  of 
5407,  B-factor=  33.6399  A2. 


:scription  of  the  structure  of  B.  anthracis  amidase 


ructural  comparison  between  T7  lysozyme  and  amidase  from  B .  anthracis 
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ie  structure  consists  of  a  six-stranded  P-sheet  surrounded  by  six  helices.  The  overall 
Id  and  topology  of  resembles  that  of  the  T7  enterobacteriophage  lysozyme  (1LBA; 
leng  et  al.,  Proc.  Natl.  Acad.  Sci.  USA  91  4034-4038  (1994)),  although  the  sequence 
entity  with  the  T7  lysozyme  is  only  12%.  Amino  acids  coordinating  the  zinc  ion  in  the 
o  structures  are  also  similar,  consisting  of  2  histidines,  1  cysteine  (shown  as  bold  type 
at  in  the  above  sequence  alignment)  and  a  water  molecule.  The  active  site  of  the 
zyme  is  in  the  cleft  near  to  the  zinc  ion.  T7  lysozyme  uses  a  Tyr-46  and  a  Lys-128  for 
catalytic  activity,  whereas  the  B.  anthracis  enzyme  probably  uses  Tyr-42  and  Lys- 
4.  Further  analysis  fo  the  structure  is  in  progress. 


180°  -> 

vo  views  of  the  B.  anthracis  amidase.  The  Zn  ion  is  shown  as  a  gray  ball. 


Appendix  5 

Structural  studies  of  AtxA,  a  member  of  the  PRD  family  of  transcriptional 
activators. 

Modeling  of  AtxA:  AtxA  is  the  “master  regulator”  of  virulence  genes.  However,  its 
mechanism  of  action  is  unknown.  Although  recent  literature  reports  have  suggested  that 
AtxA  bears  only  limited  sequence  similarity  with  proteins  of  known  structure,  our 
analysis  using  our  FFAS  tools  reveals  the  domain  organization  with  high  confidence,  as 
described  below.  AtxA  is  a  member  of  a  family  of  multidomain  transcriptional  activators 
and  antiterminators  that  contain  a  “PRD”  (for  Phospho-transferase  system  (PTS) 
Regulatory  Domain).  They  all  contain  an  N-terminal  nucleic  acid  binding  domain,  two 
PRD  domains,  and  in  the  case  of  the  activators,  a  C-terminal  PTS  IIA  or  IIB  domain. 
Our  modeling  studies  suggest  that  AtxA  contains: 

1)  A  DNA-binding  domain  at  its  N-terminus  (residues  1-135)  that  is  homologous  to  the 
Diphtheria  Toxin  repressor  (>97%  confidence).  We  built  a  3D  model  of  this  region 
based  on  the  crystal  structure  of  the  DT  repressor  in  complex  with  DNA.  This  model 
reveals  the  conservation  of  basic  residues  on  a  helix-turn-helix  scaffold  that  would 
engage  the  DNA  phosphate  backbone;  it  also  reveals  a  conserved  hydrophobic 
dimerization  interface  in  domain  2.  Thus,  the  modeling  studies  clearly  suggest  that  the 
N-terminal  part  of  AtxA  is  a  dimeric  DNA-binding  module. 
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Figure:  Sequence  alignments  of  the  PRD  domains  of  transcriptional  activators.  The  vertical  bar 
indicates  conserved  residues,  of  which  the  four  histidines  are  phosphorylated  during  signal 
transduction.  Note  that  AtxA  contains  only  one  of  these  histidines. 

2)  Two  PRD  domains  (domain  3  and  4;  Residues  160-390.)  This  is  predicted  with  even 
higher  confidence  (>99%),  and  a  reliable  3-dimensional  model  can  be  built  based  on  the 
crystal  structure  of  the  Lict  transcriptional  antiterminator  [61].  In  other  members  of  this 
family,  the  duplicated  PRD  module  is  phosphorylated  on  4  conserved  histidines  by  a 
phosphotransferase  system  (PTS)  in  response  to  an  environmental  cue.  The 
phosphorylations  are  thought  to  modify  the  stability  of  the  dimeric  proteins  and  thereby 
the  RNA-  or  DNA-binding  activity  of  the  effector  domain.  However,  sequence 
alignment  of  AtxA  shows  that  only  one  of  the  4  histidines  is  conserved. 
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Figure:  Hypothetical  model  of  AtxA  bound  to  DNA,  showing  domain  type  at  right  and  domain 
numbering  at  left. 

3)  At  the  C-terminus  is  a  PTS  IIB  domain  (>97%  confidence),  residues  385-475. 
Previously  characterized  members  of  the  family  have  an  invariant  cysteine  residue  that  is 
phosphorylated  during  signal  transduction.  Atxa  does  not  share  this  phosphorylatable 
cysteine. 

Taken  together,  this  modeling  exercise  suggests  that  AtxA  is  regulated  in  a  non-canonical 
fashion,  most  likely  not  by  phosphorylation  of  histidines  and  cysteines. 

Expression  constructs 

Constructs  for  the  expression  as  His-tag  fusions  of  the  following  fragments  of  AtxA  in 
E.coli,  strain  BL21(DE3),  were  produced: 

Full-length,  alone  and  co-expressed  with  pXOl-1 18 
Putative  DNA-bonding  domain:  1-141  and  1-160 

Putative  regulatory  domains  including  the  PRD  homology  domain:  141-393, 141-475, 
162-393, 162-475 

PTS  IIB  homology  domain:  388-475 


Expression  of  the  putative  DNA-binding  domain 

His-tagged  AtxA  1-141  and  1-160  were  expressed  for  5  hours  at  37°C  and  were  found  in 
the  inclusion  body  insoluble  fraction.  After  purification  over  a  Superdex  200  10/30  gel 
filtration  column  (Amersham)  under  denaturing  condition,  fragment  1-141  was  refolded 
with  the  surfactant/cycloamylose  method.  A  smaller  amount  of  the  same  fragment  was 
obtained  in  soluble  form  using  traditional  dialysis  refolding  methods. 

In  order  to  characterize  the  possible  DNA  binding  of  this  polypeptide  by  electrophoretic 
mobility  shift  assay  (EMSA)  we  have  created  DNA  fragments  from  the  promoter  region 
of  the  pagA  gene,  one  of  the  targets  of  AtxA,  by  PCR  amplification. 

Expression  of  full-length  and  the  PRD  homology  domain  of  AtxA 

His-tagged,  full-length  AtxA  was  expressed  at  15  and  37°C  and  the  product  was  only 
detectable  by  SDS  PAGE  with  western  blotting  using  monoclonal  antibodies  direted 
against  a  His5  tag  (Novagen).  The  product  is  found  in  both  the  insoluble  and  the  soluble 
fractions.  When  pXOl-1 18  and  AtxA  were  co-expressed  at  37°C,  an  increase  in  total,  but 
not  soluble,  AtxA  was  observed,  while  pXOl-1 18  expression  levels  were  unaltered, 
when  compared  to  the  expression  of  each  protein  alone.  Similar  results  were  obtained  for 
the  expression  of  AtxA  141-393, 162-475  and  162-475  at  37°C.  We  are  currently  cloning 
shorter  fragments  within  the  regulatory  region  of  atxA  in  order  to  identify  possible  short 
sequences  responsible  for  the  low  level  of  expression  observed. 

Expression  of  AtxA  (388-475) 

AtxA  (388-475)  could  be  expressed  solubly  to  satisfactory  levels  as  judged  by  SDS 
PAGE. 

Other  expression  systems 

We  are  currently  in  the  process  of  cloning  the  protein  fragments  described  above  for 
expression  in  B.  megaterium  (MoBiTec),  Sf9  cells  and  in  a  cell-free  system  (RTS,  Roche 
Applied  Sciences).  Full-length  AtxA  (1-475),  as  well  as  the  putative  regulatory  domains 
(1-141 , 141-475)  could  be  expressed  in  sf9  cells  at  low  level  (western  blot  detectable) 
with  considerable  degradation  problem  (ladders  shown  in  Western  blot).  To  avoid  the 
degradation,  secretion  expression  of  full  length  AtxA  is  underway.  At  this  time,  we  have 
the  baculovirus  and  are  doing  amplification. 

AtxA 

Full-length  AtxA  (1-475),  as  well  as  the  putative  regulatory  domain  (141-393, 162-393, 
141-475, 162-475)  could  only  be  expressed  at  very  low  levels  (detectable  only  by  western 
blotting)  in  E.coli  despite  considerable  effort.  Expression  at  lower  temperatures,  as  low  as 
15  C,  improved  solubility  and  increase  the  amount  of  full-length  product,  but  not  to 


satisfactory  yields.  Co-expression  of  the  full-length  with  pXOl-1 18  did  not  bring  any 
meaningful  improvement.  The  putative  DNA  binding  domain  was  expressed  in  the 
insoluble  fraction  and  could  not  be  solubilized  in  suitable  amounts  for  DNA-binding 
studies.  Cloning  of  AtxA  homologs  AcpA  and  AcpB  (about  25%  identical,  50%  similar 
to  AtxA)  is  underway.  An  attempt  to  identify  possible  short  fragments  of  AtxA  that  are 
responsible  for  low  level  expression  has  also  been  started. 
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Identification  of  small  molecule  inhibitors  of 
anthrax  lethal  factor 

Rekha  G  Panchall>  Ann  R  Hermone1*5,  Tam  Luong  Nguyen!p,  Thiang  Yian  Wong2,  Robert  Schwarzenbacher2, 
James  Schmidt*',  Douglas  Lane1,  Connor  McGrath1,  Benjamin  E  Turk4,  James  Burnett1,  M  Javad  Aman3, 

Stephen  Little3,  Edward  A  Sausville1,  Daniel  W  Zaharevitz1,  Lewis  C  Cantley4,  Robert  C  Liddington2,  Rick  Gussio1 
&  Sina  Bavari3 

The  virulent  spore-forming  bacterium  Bacillus  anthrads  secretes  anthrax  toxin  composed  of  protective  antigen  (PA),  lethal  factor 
(LF)  and  edema  factor  (EF).  IF  is  a  Zn-dependent  mefalloprotease  that  inactivates  key  signaling  molecules,  such  as  mitogen- 
activated  protein  kinase  kinases  (MAPKK),  to  ultimately  cause  cell  death.  We  report  here  the  identification  of  small  molecule 
(nonpeptidic)  inhibitors  of  LF.  Using  a  two-stage  screening  assay,  we  determined  the  LF  inhibitory  properties  of  1 9  compounds. 
Here,  we  describe  six  inhibitors  on  the  basis  of  a  pharmacophoric  relationship  determined  using  X-ray  crystallographic  data, 
molecular  docking  studies  and  three-dimensional  (3D)  database  mining  from  the  US  National  Cancer  Institute  (NCI)  chemical 
repository.  Three  of  these  compounds  have  K\  values  in  the  0.5-5  pM  range  and  show  competitive  inhibition.  These  molecular 
scaffolds  may  be  used  to  develop  therapeutically  viable  inhibitors  of  LF. 


Anthrax*  a  disease  caused  by  Bacillus  imthrads has  recently  been  the 
subject  of  intense  interest  because  of  its  use  as  a  biological  weapon 
against,  human  populations.  The  inhalation  of  B.  anthracis  spores  is 
often  fatal  if  the  condition  is  not  properly  diagnosed  and  treated  with 
antibiotics  during  the  early  stages  of  infection.  In  many  cases  antibi¬ 
otic  regimes  may  not  be  effective,  especially  if  there  is  bacterium  over¬ 
load,  which  causes  large  amounts  of  lethal  toxin  to  be  released.  Hence, 
a  new  level  of  adjunct  treatment  is  needed  to  inactivate  the  toxins 
STgj.  released  byB.  anthracis. 

Anthrax  toxin  (AT)  consists  of  three  proteins:  lethal  factor,  protective 
antigen  and  edema  factor,  all  of  which  work  in  concert  to  kill  host  cells. 
Initially,  PA  binds  to  an  AT  receptor1, 2  on  the  host  ceil  surface,  where  it 
is  subsequently  cleaved  by  furin  (or  furin-like  proteases)  to  produce  a 
20-kDa  N-termina)  fragment  (PA20)  and  a  63-kDa  C-terminal  frag¬ 
ment  (PA^)3,4.  After  cleavage,  seven  PA6?  monomers  assemble  to  form 
a  heptameric  prepore  capable  of  binding  both  IF  and  EF.  Upon  binding 
of  LF  or  EF,  the  entire  complex  undergoes  receptor-mediated  endo- 
cytosis.  It  is  hypothesized  that  the  acidic  endosomal  environment 
causes  a  conformational  change  in  the  PA63  heptamcr  to  produce  a 
functional  pore  that  traverses  the  membrane  and  translocates  the  two 
enzymatic  moieties  LF*  and  EP  into  the  cell  cytosol.  EF  is  a  calmodultn- 
dependent  adenylate  cyclase*;  LF  is  a  Zn-dependent  metalloprotease 
that  cleaves  several  members  of  the  MAPKK  family  near  the  N  termi¬ 
nus6,7.  This  cleavage  prevents  interaction  with,  and  phosphorylation 
of,  downstream  MAPK8,  thereby  inhibiting  one  or  more  signaling 


pathways.  Through  a  mechanism  that  is  not  yet  well  understood,  this 
results  in  the  death  of  the  host.  Recent  studies  suggest  that  the  inactiva¬ 
tion  of  p38  MAPK  induces  apoptosis  in  LF-exposcd  macrophages, 
thereby  preventing  the  release  of  chemokines  and  cytokines,  and 
preventing  the  immune  system  from  responding  to  the  pathogen9. 

Based  on  the  current  understanding  of  the  mechanism  of  anthrax 
toxin,  methods  may  be  developed  to  inhibit  various  steps  in  toxin 
assembly  and/or  function.  In  one  antitoxin  therapy  approach, 
dominant-negative  PA  mutants  have  been  generated  that  coassemble 
with  the  wild-type  PA  protein,  blocking  the  translocation  of  LF  and  EF 
across  the  cell  membrane.  Such  PA  mutants  are  potent,  inhibitors  of 
anthrax  toxin  in  both  ceil -based  assay's  and  in  vivo  animal  models10*1  L 
In  a  second  approach,  a  peptide  inhibitor  that  binds  to  the  heptameric 
PA  and  prevents  the  interaction  of  PA  with  LF  and  F.F  has  shown  effi¬ 
cacy  in  animals12. 

The  lethal  action  of  anthrax  toxin  may  also  be  inactivated  by  mole¬ 
cules  that  inhibit  the  protease  activity  of  LF.  So  far,  the  only  known 
small  molecule  inhibitors  of  LF  ere  nonspecific  hydroxymates  that  arc 
effective  at  >100  jiM  concentration1*1  and  more  recently  reported 
hydroxymate  derivatives  of  peptide  substrate  that  inhibit  LF  at 
nanomolar  concentrations14.  In  this  study,  we  identified  several  small 
(nonpeptidic)  compounds  that  inhibit  anthrax  LF  protease  activity 
with  values  in  the  0.5-5  jiM  range.  We  approached  anthrax  thera¬ 
peutic  development  (in  parallel  with  the  pepridomimetic  approach 
used  by  Turk  et  oZ15;  this  issue)  using  structure-based  discovery  to 
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Figure  l  A  two-stage  assay  for  screening  and  validating  small  molecule  inhibitors  of  anthrax  lethal  factor,  (a)  Representative  data  from  a  fluorescent  plate 
reader  assay  showing  different  degrees  of  inhibition  by  compounds  from  the  NCI  Diversity  Set.  (b)  HPLC-based  assay  without  inhibitor,  showing  the  N-  and 
C-terminal  cleavage  products  after  incubation  of  the  substrate  with  LF  for  30  min.  (c)  HPLC-based  assay  with  inhibitor  NSC  12155  showing  a  reduced 
C-terminal  peak  area  at  365  nm,  indicating  strong  inhibition  of  LF  activity. 


identify  small  organic  molecules  as  lead  candidates.  Specifically,  we 
used  molecular  diversity  screening  combined  with  3D  database 
searching  and  molecular  modeling.  The  LF  X-ray  crystal  structure 
reported  by  Pannifer  et  a/.16  was  useful  during  the  structure-based 
drug  discovery  portion  of  these  studies. 

The  first  phase  of  this  study  involved  a  high-throughput  screen 
(UTS)  of  small  molecules  from  the  NCJ  Diversity  Set  to  identify  LF 
inhibitors.  Hits  identified  from  the  HTS  were  verified  with  an  HPLC- 
based  assay.  Afterwards,  wc  used  X-ray  crystallography  and  molecular 
modeling  (conformational  sampling,  database  mining  and  molecular 
docking)  to  identify  additional  lead  therapeutics.  Based  on  an  iterative 
process  of  compound  selection  and  biological  testing,  a  pharma¬ 
cophore  for  LF  inhibitors  was  developed. 

RESULTS 

High-throughput  screening  and  hit  validation 

To  screen  and  identify  compounds  that  inhibit  LF  activity,  wc  devel¬ 
oped  a  high-throughput  fluorescence -based  assay.  An  optimized  pep¬ 


tide  (KKVYPYPME;  B.E.T.  et  al.<  unpublished  data)  with  a  flucrogenic 
coumarin  group  at  the  N  terminus  and  a  2,4-dinitrophenyl  (dnp) 
quenching  group  at  the  C  terminus  was  used  as  LF  substrate  for 
in  vitro  assays.  After  cleavage  by  LF>  fluorescence  increased  (excitation 
and  emission  wavelengths,  325  and  394  nm,  respectively).  After  stan¬ 
dardization  of  the  high-throughput  assay,  the  1,990  compounds  in  the 
NCI  Diversity  Set  were  tested  (Fig.  la).  Compounds  that  showed 
>75%  inhibition  were  selected  for  validation  using  an  HPLC-based 
assay.  This  eliminated  false  positives  due  to  fluorescence  quenching  by 
some  of  the  test  compounds.  Using  the  HPLC-based  assay  (Fig.  lb,c), 
compounds  that  showed  >50%  inhibition  were  selected  for  further 
study.  The  HPLC  assay,  in  addition  to  eliminating  false  positives,  was  a 
more  rigorous  test  of  LF  inhibition,  as  a  lower  inhibitor  concentration 
(20  pM)  was  used  (compared  with  )00-|iM  concentration  used  in  the 
fluorescence-based  assays).  Furthermore,  the  identified  LF  inhibitors 
did  not  inhibit  a  range  of  different  proteases,  thus  confirming  that 
these  compounds  did  not  inhibit  LF  promiscuously  (see  Supple¬ 
mentary  Fig.  1  online). 


Figure  2  General  pharmacophore  model  of  the  LF  inhibitors,  (a)  Black 
dashed  lines  depict  the  distances  between  the  various  centroids  of  the 
pharmacophore  centers.  Green  ellipses  (A  and  B)  are  aromatic  centers;  red 
ellipses  (C,  D  and  E)  are  polar  centers  (hydrogen  bond  donors  or  acceptors); 
blue  region  (F)  is  a  neutral  linker  that  may  include  a  variety  of  polar  or 
hydrophobic  groups,  (b)  Pharmacophoric  overlap  of  LF  inhibitors  (stick 
rendering)  and  their  correspondence  to  the  general  LF  inhibitor 
pharmacophore  shown  in  Figure  2a.  The  pharmacophoric  overlap  regions  of 
compounds  are  highlighted  in  dashed  lines  (green,  aromatic  centers;  blue, 
neutral  (polar  or  hydrophobic  groups  acceptable)  linker  region;  red,  polar 
centers.  For  all  structures:  nitrogen,  blue;  oxygen,  red.  Carbon  atoms  for  NSC 
12155,  yellow;  for  NSC  357756,  magenta;  for  NSC  369721,  green;  for  NSC 
369728.  light  blue.  The  pharmacophore  is  based  on  the  energy -refined 
X-ray  conformation  of  NSC  12155  bound  to  LF.  These  data  were  combined 
with  molecular  docking  studies  of  structurally  related  analogs  (Table  1 )  from 
3D  database  mining  studies. 
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Table  1  Two-dimensional  chemical  representations  of  LF  inhibitors  with  percent 
inhibition  at  a  compound  concentration  of  20  p.M,  Kt  values  and  type  of  Inhibition 


Structure 

NSC  number 

%  inhibition 

KjfpM) 

Inhibition  type 

MHS  »  «  WHi 

12155 

95 

0.5  ±  0.18 

Competitive 

357756 

90 

4.9  ±  1.7 

Competitive 

369718 

90 

N.D. 

N.D. 

369721 

90 

4.2  ±  G.2i 

Competitive 

•v?!XCPr 

359465 

48 

N.D. 

N.D. 

377362 

33 

N.D. 

N.D 

240899 

0 

N.D. 

N.D. 

N.D.,  net  determined. 


Pharmacophoric  features  of  anthrax  LF  inhibitors 

We  identified  19  compounds  with  >50%  LF  inhibition  (at  20  jiM 

inhibitor  concentration)  from  the  NCI  Diversity  Set  screen.  These 

included  several  organometallic  and  charged  molecules.  Here,  we  chose 

to  concentrate  on  only  relatively  small  organic  compounds  for 

structure -based  studies,  as  these  molecules  are  more  likely  to  show 

therapeutic  potential.  The  conformational  spaces  of  two  leads, 

NSC  12155  and  NSC  357756,  were  subsequently  explored  to  generate 

multiple  pharmacophoric  hypotheses,  which 

were  then  used  in  31)  database  mining  studies 

to  idem  tif'y  additional  LF  inhibitors.  We  carried  ^ 

out  several  iterations  of  this  process,  which 

consisted  of  31)  database  mining  of  the  entire  Je 

NCI  repository  (as  well  as  commercially  avail-  NSC  Jf 

able  chemical  repositories  including  the 

Available  Chemicals  Directory,  May  Bridge 

and  BioBytc)  and  subsequent  biological  test- 

ing,  to  identify  new  inhibitors.  During  this  |  2ri^ 

process  >60  compounds  were  tested  and  most 

of  them  were  inactive.  However,  six  of  the 

compounds,  which  showed  a  range  of  LF 

inhibitory  potency,  were  used  to  develop  and 

refine  a  consistent  pharmacophore  (Fig.  2a).  A 

3D  su  perimposition  of  four  of  the  most  potent  [,'f j*  X-ray  crystal  st 

I.F  inhibitors  {NSC  12)55.  NSC  357756.  NSC  (a*  D°™;^ , 

369718  and  NSC  369721)  (Fig.  2b)  exhibits  an  Molecular  surface  of  LF 

excellent  overlay  of  the  polar  heteroatoms  and  model  of  the  inhibitor  m 

hydrophobic  substituents  of  these  molecules.  2F0-  Fc,  is  contoured  at 

The  chemical  structures  of  a  range  of  identi-  difference  map.  2F0-  Ft 

fied  LF  inhibitors  arc  shown  in  Table  1 .  rotatable  bond,  and  almi 


live,  noncompetitive  or  uncompetitive), 
we  determined  kinetic  constants  of  the 
peptide  substrate  and  compared  them  with 
those  obtained  in  the  presence  of  different 
inhibitor  concentrations.  The  A'rn  and  VmrLX 
values  for  the  LF-calalyzed  hydrolysis  of 
the  peptide  substrate  were  19  pM  and 
LI  pmol  in irr1  mg'1  of  LF.  respectively. 
NSC  12155,  NSC  357756  and  NSC  369721 
showed  competitive  inhibition  (Table  t),  as 
they  had  no  effect  on  the  VnuX,  hut 
increased  with  inhibitor  concentration  (see 
Supplementary  Fig.  2  online). 


:  u.4i  vurnpemfve  An|hgax  LF_NSC  ,  2l  55  COCryStal 
structure 

D  The  crystal  structure  of  LF  in  complex  with 

NSC  12155  (the  most  potent  inhibitor)  was 
determined  at  a  resolution  of  2.9  A  (elcc- 
ND  tron  density  map,  Fig.  3a).  NSC  12155 

binds  to  the  catalytic  site  of  LF  with  its  urea 
moiety  close  to  the  catalytic  Zn  atom 
N  o  (within  4  A).  One  quinoline  ring  shows 

- strong  electron  density  near  the  side  chain 

of  Hi$690,  suggesting  a  favorable  jt-stacking 
interaction  between  the  histidine's  side 
chain  imidazole  and  the  quinoline  ring 
(Fig.  3b).  Conversely,  the  second  quinoline  showed  poor  electron 
density,  indicating  that  there  is  more  rotational  freedom  about  its 
quinoline-urea  bond.  Despite  the  overall  lack  of  a  strong  positional 
preference  for  this  quinoline,  a  more  consistent  density  was 
detected  near  its  amino  substitution,  indicating  a  slightly  greater 
preference  for  a  ‘C-shapevf  conformation  of  NSC  12155  when 
bound  to  LF.  This  is  consistent  with  the  pharmacophoric  overlap 
shown  in  Figure  2b. 


Kinetic  studies 

To  determine  the  Rvalues  and  types  of  inhi¬ 
bition  mediated  by  the  inhibitors  (competi- 


Figure  3  X-ray  crystal  structure  of  the  LF-NSC  12155-Zn  complex.  The  electron  density  surrounding 
NSC  12155  shown  in  these  figures  are  2 Fc  -  fc  difference  maps  (see  Methods)  calculated  at  2.9-A 
resolution,  (a)  Detailed  view  of  the  electron  density  trace  and  overall  model  fit  of  NSC  12155. 
Molecular  surface  of  LF  colored  by  charge  (red,  negative;  blue,  positive),  with  Zn2*  (cyan),  and  the 
model  of  the  inhibitor  molecule  NSC  12155  (yellow)  in  stick  representation.  The  difference  map, 

2F0 -  Fc,  is  contoured  at  1.1  o  level,  (b)  The  inhibitor  NSC  12155  bound  in  the  active  site  of  LF.  The 
difference  map,  2F0  -  Fc,  is  contoured  at  1 .0  a.  A  portion  of  NSC  12155  appears  nonrigid  owing  to  a 
rotatable  bond,  and  almost  full  electron  density  coverage  is  seen  for  this  portion  at  a  contour  level  of 
0.6  o.  Inhibitor  molecule  (yellow).  2inc-coordinating  residues  (H686,  H690,  E735)  and  catalytic 
residues  (E687,  Y728)  are  in  stick  representation.  The  Ca  atoms  of  residues  680-694  (green, 
background)  and  726-742  (beige,  foreground)  are  in  ribbon  representation.  The  Zn2f  ion  (cyan)  is  a 
lined  sphere,  and  its  hydrogen  bonds  with  His686,  His690  and  Glu735  are  represented  as  aligned 
small  white  spheres.  These  figures  were  prepared  using  SPOCK  (http://mackerel.tamu.edu/spock/). 
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Inhibitor  concentrations  (uM) 

Figure  4  Efficacy  of  LF  inhibitors  in  a  cell- based  toxicity  assay.  J774A.1 
cells  were  pretreated  with  either  DMSO  control  or  various  concentrations  of 
inhibitors,  and  then  incubated  with  anthrax  lethal  toxin.  After  4  h,  cell 
viability  was  determined  with  MTT  dye. 


Molecular  docking  studies 

To  further  investigate  whether  the  C  conformation  has  an  important 
role  during  the  binding  of  NSC  12155  to  LF,  we  used  molecular 
docking  to  study  the  conformational  preference  of  the  freely  rotating 
quinoline  in  the  NSC  i  2 1 55— LF  model.  Results  from  these  analyses 
suggest  that  the  NSC  12)55  scaffold  does  prefer  the  planar  C  confor¬ 
mation  to  the  ‘L-shaped*  conformation  when  bound  to  LF.  This  is 
further  supported  by  the  following:  (i)  quantum  mechanical  calcu¬ 
lations  at  the  level  of  density  functional  theory,  as  well  as  analysis  of 
related  crystal  structures  (data  not  shown),  support  a  planar  prefer¬ 
ence  (either  Lor  C  shaped)  for  NSC  12155;  (ii)  rotation  of  the  Tree’ 
quinoline  out  of  plane  to  its  planar  L  conformation  results  in  unfa¬ 
vorable  hydrophobic-polar  interactions  between  the  amino  groups 
of  NSC  12  .1 55  and  the  side  chain  of  Val675;  (in)  in  the  planar  C  con¬ 
formation,  the  urea  oxo  and  quinoline  amino  substituents  of  NSC 
12155  are  more  likely  to  engage  in  favorable  intramolecular  acid- 

P‘,~-  base  interactions;  (iv)  molecular  docking  studies  of  32  substituted 
quinoline  and  urea  derivatives  (chcmoinformatically  mined  from 
the  NCI  repository),  which  were  inactive  in  the  LF  assay  (data  not 
shown),  indicate  that  these  scaffolds  are  either  incapable  of  forming 
the  preferred  C  conformation  of  NSC  12155  or  lack  features  that 
would  enable  favorable  binding;  and  (v)  additional  modeling  studies 
of  NSC  12155  indicate  that  the  urea  nitrogens  are  within  range  to 
form  favorable  acid-base  interactions  with  the  carboxylate  of 
Glu6S7  (supported  by  X-ray  data:  distances  of  the  urea  nitrogens  of 
NSC  12155  are  4.12  A  and  4,72  A  from  OEl  and  OE2  of  Ciu6S7, 
respectively). 

Cytotoxicity  assay 

To  determine  the  ability  of  the  small  molecule  inhibitors  to  protect 
macrophages  against  LF,  we  pretreated  the  cells  with  NSC  12155, 
NSC  357756,  NSC  369718  or  NSC  369721  at  concentrationsTanging 
from  1  to  100  pM  and  birth er  incubated  them  in  the  presence  of 
anthrax  lethal  toxin.  Cell  viability  was  determined  using  MTT  dye 
(Fig.  4).  NSC  357756  showed  96%  protection  at  100  jiM,  whereas 
NSC  12155  and  NSC  369718,  the  most  potent  of  the  LF  inhibitors 
in  vitrOy  showed  lower  protection  at  .100  jiiM.  These  three  com¬ 
pounds  showed  some  protection  <25  pM,  suggesting  that  they  might 
be  good  leads  against  lethal  toxin  in  vivo .  Additionally,  NSC  369721 


was  ineffective  even  at  100  pM  in  the  cell-based  toxicity  assay.  The 
moderate  protection  of  these  inhibitors  is  probably  attributable  to 
their  limited  ability  to  penetrate  the  macrophage  cell  membrane.  The 
cell -based  data  will  aid  in  the  development  of  second -generation  LF 
inhibitors. 

DISCUSSION 

Molecular  docking  studies  of  both  inactive  and  active  analogs  of  the 
compounds  shown  in  Table  1  are  consistent  with  the  common 
pharmacophore  (Fig.  2a)  proposed  in  this  study.  For  example,  the 
amidine  groups  of  NSC  240899  formed  unfavorable  steric  and  polar 
interactions  when  docked  in  the  NSC  12155-binding  site,  which  may 
explain  this  compound's  complete  lack  of  LF  inhibition  despite  its 
structural  similarity  to  NSC  357756.  NSC  357756,  NSC  369718  and 
NSC  369721  did  not  engage  in  unfavorable  interactions  when  docked 
in  the  NSC  1 21 55-binding  site,  supporting  this  hypothesis.  However, 
the  large  size  and  solvent-exposed  nature  of  the  LF- binding  groove 
also  allows  NSC  357756,  NSC  369718  and  NSC  369721  to  assume 
several  different  binding  modes  near  the  enzyme's  active  site. 

The  X  ray  structure  of  the  LF  -NSC  12155  complex  and  the  exten¬ 
sive  molecular  docking  studies  with  LF  inhibitors  also  allow  for  the 
identification  of  favorable  structural  modifications  that  may  enhance 
the  potency  of  these  compounds.  For  example.  X-ray  and  molecular 
modeling  studies  of  NSC  12155  indicate  that  the  0,5-pM  K ;  of  this 
inhibitor  could  be  improved  by  replacing  one  of  the  quinoline  moi¬ 
eties  with  a  pyrrole.  Such  a  modification  would  provide  an  additional 
hydrogen  bond  with  the  carboxylate  of  Glu687.  The  planar  C  confor¬ 
mation  of  NSC  12155  could  be  stabilized  by  replacing  its  amino  sub¬ 
stituents  with  nitro  groups,  thus  facilitating  resonance  throughout  this 
scaffold.  Additionally,  our  study  in  concert  with  Turk  ei  al. 1  ?  suggests 
that  replacement  of  one  of  NSC  12155’s  quinoline  rings  with  a 
tetru-aza-benzo(a]  iluorene  would  enhance  binding  by  placing  addi¬ 
tional  molecular  volume  in  the  SI'  site  of  LF.  Moreover,  the  deep  Si' 
pocket  (visible  in  Fig.  3a,  next  to  zinc)  seems  highly  selective,  such  that 
a  large  hydrophobic  ring  structure  would  probably  increase  the  affinity 
of  an  inhibitor  for  the  LF  active  site. 

In  summary,  these  studies  describe  a  first  critical  phase  in  generat¬ 
ing  therapeutically  viable,  small  molecule  (nonpeptidic)  countermea¬ 
sures  for  anthrax  lethal  toxin.  During  the  next  phase  of  inhibitor 
optimization,  information  obtained  from  the  cell-based  assay  will 
guide  the  incorporation  of  structural  components  that  will  increase 
inhibitor  bioavailability,  while  at  the  same  time  allowing  for  optimal 
binding  affinity  in  the  LF  substrate-binding  deft. 

METHODS 

Diversity  set.  In  brief,  the  NCI  Diversity  Set  is  a  collection  of  1 ,990  compounds 
chosen  (from  71,756  open  compounds  in  the  NCI  chemical  repository  with 
£  1  g  inventory)  to  cover  a  large,  diverse  range  of  molecular  scaffolds  and  phar¬ 
macophore  features,  while  also  being  relatively  rigid  (all  compounds  in  the 
Diversity  Set  have  five  or  fewer  rotatable  bonds,  facilitating  pharmacophore 
development  and  conformational  sampling).  For  a  detailed  description  of  the 
Diversity  Set  compound  selection  and  criteria  see  http://dtp.nd. nih.gov/ 
branches/dscb/diversity„exj>lanation.html. 

Fluorescent  plate-based  assay.  For  high- throughput  screening  in  96-well 
plates,  the  reaction  volume  was  1 00  pi  per  well.  Master  mix  containing  40  mM 
HEPES,  pH  7.2. 0.05%  fv/v)  Tween  20, 100  pM  CaCL  and  l  pg  ml“J  of  LF  was 
added  to  each  well  containing  100  pM  of  NCI  Diversity  Set  compound.  The 
reaction  was  initiated  by  adding  the  optimized  peptide  substrate  (MCA- 
KKVYPYPME[dnplK  amide),  to  a  final  concentration  of  20  pM,  Kinetic  mea¬ 
surements  were  obtained  every  minute  for  30  min  using  a  fluorescent  plate 
reader  (Molecular  Devices,  Gemini  XS).  Excitation  and  emission  maxima  were 
324  nm  and  395  nm,  respectively. 
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Table  2  Data  collection  summary  of  LF-NSC  12155-Zn 
complex  crystal  _ 


Resolution  range  (A) 

25.0-2.90 

Reflections 

Total 

175,849 

Unique 

55,384 

Completeness  <%;* 

99.5(59.3) 

R,.,n  W 

10.6  (49.6) 

11.7  (2.9) 

’‘Values  in  parentheses  are  tor  the  highest -resold lion  shell.  btfiyr;,  =  XI  ( ~  <J>  !  /  E</>,  where  / 
is  the  observed  intensity  and  <t>  is  the  average  intensity  Irom  'multiple  observations  of 
symmetry-related  reflections. 


H  PLC-based  assay.  An  H  PLC-based  a^aywas  used  to  validate  the  hits  from  the 
primary  screen  and  eliminate  the  false  positives  obtained  owing;  to  fluorescence 
quenching.  Reaction  mix  (30  pi  total  volume)  containing  40  mM  HEPES, 
pH  7.2,  0.03%  ( v/v)  Tween  20,  100  pM  CaCi2f  LF  substrate  (20  pM  Final  con¬ 
centration),  with  or  without  the  inhibitor  (20  JlM  final  concentration),  was 
incubated  with  LF  ( 1  jig  ml  *)  for  30  min  at  30  aC  The  reaction  was  stopjvd  by 
adding  8  M  guanidine hydrc»chloride  in  0.3%  (v/v)  TFA.  Substrate  and  prod¬ 
ucts  were  separated  on  a  Hi-Pore  CIS  column  (Bio- Rad)  using 0.1%  (v/v)  TFA 
(solvent  A)  and  0.1%  (v/v)  TFA  +  70%  (v/v)  acetonitrile  (solvent  B).  The  col¬ 
umn  effluent  was  monitored  at  363  r.m,  where  the  substrate  and  C-terminal 
cleavage  products  showed  greater  absorbance. 

The  H  PLC-based  assay  was  used  for  enzyme  kinetic  studies.  Kinetic  constants 
were  obtained  from  plots  of  initial  rates  with  seven  concentrations  of  the  sub¬ 
strate.  For  the  best  inhibitors,  Kt  and  the  type  of  inhibit  ion  were  evaluated  using 
seven  different  concentrations  of  the  substrate  ranging  from  2  to  40  ,uM  and  four 
different  concentrations  of  the  inhibitor.  K.  values  for  the  competitive  inhibitors 
were  calculated  using  ihe  equation  K:t  =  !I]  /  Km)~  1],  where  [1]  is  the 

inhibitor  concentration1'.  K-t  values  in  Table  1  are  the  av  erages  ±  s.d. 


LF  refinement  and  inhibitor  docking.  The  structure  of  LF  was  energy- refined 
using  the  Discover  (Accclrys)  program's  cfT9l  force  field.  Our  strategy  entailed 
using  a  step-down,  template  forced  minimization  procedure  with  the  Zn  coor¬ 
dination  site  fixed.  This  process  was  repeated  until  coordinates  of  the  final 
model  were  within  the  experimentally  determined  X-ray  crystallographic  reso¬ 
lution.  The  inhibitor-enzyme  structure  coordinates  were  subsequently  tether- 
minimized  in  the  same  manner  as  described  above,  and  the  final  structure  was 
subjected  to  hydropathic  analysis  using  HINT  fed uSoft). 

Conformer  generation.  Conformational  models  of  inhibitors  were  generated 
using  Catalyst  4.7  (Aceelrys).  A  ‘best -quality*  conformational  search  was  used 
to  generate  conformers  within  20  kcal  mol-1  of  the  global  energy  minimum. 


Data  mining.  Catalyst  4.7  (Aceelrys)  was  used  for  all  database  mining.  Briefly, 
the  imidazole  rings  of  NSC  357756  were  used  to  form  a  three-dimensional 
search  query  (A.R.H.  etnl,  unpublished  data).  Subsequent  molecular  docking 
studies  (see  above)  were  used  to  suggest  candidates  for  biological  testing. 


Quantum  mechanical  calculations.  The  conformations  (L  and  C  shaped)  of 
NSC  12155  were  fully  optimized  (until  the  norm  of  the  gradient  was 
<5.0  x  lO1)  using  DGauss  (Oxford  Molecular  Group).  Local  spin  density  (LSD) 
correlation  potentials  were  approximated  by  the  Vosko-Wilk-Nusair  method1 8 
and  gattssian  analytical  functions  were  used  as  basis  sets.  LSD-optimized 
orbital  basis  sets  of  double  £-split  valence  polarization  quality19  were  used.  In 
final  optimizations,  the  BLYP  exchange-correlation  functional20*’1  was  applied 
as  a  nonlocal  gradient  correction  alter  each  self-consistent  field cycle. 

Crystallization.  Native,  wild-type  LF  protein  was  crystallized  using  13  mg  ml"1 
LF.  Crystals  were  grown  from  1.7  M  (NH.1)2SO>},  0.2  M  Tris-HCl,  pH  7.5-8.0, 
2  i»M  EDT4,  using  hanging- drop  vapor  diffusion16.  Monodinic  crystals 
appeared  after  four  days  to  two  weeks,  and  were  then  harvested  for  experi¬ 
ments.  The  LF  crystals  belong  to  the  monodinic  space  group  Pl^  with  unit  cell 
dimensions  a  =  96.70  A,  b  *  137.40  A,  c  =  98.30  A,  a  =  y-  90°,  £  •-=  98°,  con¬ 
taining  two  molecules  per  asymmetric  unit. 


LF-inhibitor  complexes.  LF  native  crystals  were  harvested  from  the  hanging 
drops  in  which  they  were  grown,  bathed  in  several  rounds  oflresh  buffer  with¬ 
out  EDTA  containing  1.9  M  (NH^jSO^,  0.2  M  Tris-HCl,  pH  8.0,  and  left  to 
soak  in  this  solution  for  a  further  30  min.  These  crystals  were  then  used  to 
obtain  the  protein-inhibitor-zinc  complexes.  Ail  manipulations  were  done  at 
room  temperature  (23-26  °C). 

The  LF-NSC  12155-Zn  complex  was  obtained  by  soaking  an  individual 
native  LF  monoclinic  P2}  crystal  in  a  solution  of  1  mM  Zn$Ov 
1.9  M  (NH^LSO.j,  0.2  M  Tris-HCl,  pH  8.0  for  5  min.  The  crystal  was  then 
transferred  to  a  solution  of  1.0  mM  NSC  12 1 55,  1%  (v/v)  DMSO,  1.9  M 
(NHd)iS04, 0.2  M  Tris-HCl,  pH  8.0  for  15  min.  Finally,  the  crystal  was  trans¬ 
ferred  into  a  cryoprotectant  solution  of  1 .0  mM  NSC  1 2 155, 2.4  M  (NH^)_,S04, 
0.2  M  Tris-HCl,  pH  8.0. 2  mM  EDTA,  25%  (v/v)  glycerol,  and  soaked  at  room 
temperature  for  I  min.  The  crystal  was  then  immediately  mounted  onto  a 
cryoloop  and  flash-frozen  in  liquid  nitrogen.  All  data  were  collected  at  100  K. 

Data  collection.  Datasets  for  the  LF  complexes  were  collected  at  the  Stanford 
Synchrotron  Radiation  Laboratory  (SSRL,  Menlo  Park,  California,  USA)  on 
bearr.line  9-1  (wavelength  -  0.983  A).  X-ray  diffraction  data  were  collected  for 
the  LF-NSC  12155-Zn  complex  to  a  resolution  limit  of  2.90  A.  Data  collection 
statistics  are  shown  in  Table  2. 

Structure  solution  and  refinement.  Collected  data  were  processed  in  the  HKL 
package22.  Refinement  and  model  building  were  done  in  CMS2*  and 
respectively.  Using  PDB  entry  I J7N  as  the  starting  model,  the  model  ofLF  alone 
was  pul  through  rigid  body  refinement  and  then  minimization  before  the  first 
initial  maps  were  calculated  for  model  building  and  further  refinement.  Excess 
electron  density  at  1.0  o  indicated  the  binding  location  of  the  inhibitor  in  the 
active  site  nfl.F.The  model  of  the  inhibitor  was  then  built  into  this  position  and 
further  refined  in  CNS23.  The  final  R- factors  were  RirCi.~  27.38%  and  Kwo.k  - 
22.38%.  The  final  model  falls  within  or  exceeds  the  limits  of  all  the  quality  cri¬ 
teria  of  PROCHECK  from  thoCCPl  suite25. 

Cytotoxicity  assay.  J774A.1  cells  were  preincubated  with  DMSO  control  or 
compounds  for  30  min  and  then  treated  with  PA  (50  ng  ml-1)  and  LF 
(14  ng  ml'  Alter  4  h  incubation  with  the  toxin,  25  ftl  ofMTT  ( l  mg  ml"1)  dye 
was  added  and  the  cells  were  further  incubated  for  2  h.  The  reaction  was 
stopped  by  adding  an  equal  volume  of  lysis  buffer  (20%  (v/v)  DMF  and  20% 
( w/v)  SDS,  pH  4.7).  Plates  were  incubated  overnight  at  37  °C  and  absorbance 
was  lead  at  570  nm  in  a  multiwell  plate  reader.  Experiments  were  done  in  dupli¬ 
cate  and  repeated  three  independent  times  for  each  of  the  inhibitors  tested.  The 
results  are  the  averages  ±  s.d. 

Coordinates.  The  coordinates  and  structurcfactors  for  the  LF-NSC  12155-Zn 
complex  have  been  deposited  in  the  Protein  Data  Bank  (accession  code  1  PWPj. 

Note:  Suppfan&tMry  mfonmzbu  h  available  on  the  Nature  Structural  &  Molecular 
Biology  website. 
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The  structural  basis  for  substrate  and  inhibitor  selectivity 
of  the  anthrax  lethal  factor 

Benjamin  E  Turk1-5,  Thiang  Yian  Wong2-5,  Robert  Schwarzenbacher2,  Emily  T  Jarrell1,  Stephen  H  Leppla3, 

R  John  Collier4,  Robert  C  Lkldington2  &  Lewis  C  Cantley1 

Recent  events  have  created  an  urgent  need  for  new  therapeutic  strategies  to  treat  anthrax.  We  have  applied  a  mixture-based 
peptide  library  approach  to  rapidly  determine  the  optimal  peptide  substrate  for  the  anthrax  lethal  factor  (LF),  a  metalloproteinase 
with  an  important  role  in  the  pathogenesis  of  the  disease.  Using  this  approach  we  have  identified  peptide  analogs  that  inhibit  the 
enzyme  in  vitro  and  that  protect  cultured  macrophages  from  LF-mediated  cytolysis.  The  crystal  structures  of  IF  bound  to  an 
optimized  peptide  substrate  and  to  peptide-based  inhibitors  provide  a  rationale  for  the  observed  selectivity  and  may  be  exploited 
in  the  design  of  future  generations  of  LF  inhibitors. 


Inhalational  anthrax  progresses  rapidly  to  a  highly  fatal  systemic 
infection1.  The  causative  bacterium  Bacillus  anthracis  secretes  three 
plasmid-encoded  toxin  proteins  that  contribute  to  pathogenesis:  pro¬ 
tective  antigen  (PA}>  edema  factor  (EF)  and  lethal  factor  (LF)*.  PA  binds 
to  a  cell  surface  receptor  and  forms  on  oligomeric  pore  that  translocates 
both  EF  and  LF  into  the  cytosol  of  target  cells.  The  combination  of  PA 
and  LF  Is  known  as  lethal  toxin  {LeTx),  and  intravenous  delivery  of 
LeTx  alone  causes  death  in  rodents2'’,  in  addition*  B.  onthrads  strains 
deficient  in  either  component  of  LeTx  are  greatly  attenuated,  suggesting 
an  important  role  for  the  toxin  in  the  disease4.  As  antibiotics  alone  typ¬ 
ically  fail  against  systemic  anthrax  unless  administered  at  an  early  stage, 
LeTx  has  been  proposed  as  a  potential  target  for  anthrax  drugs  to  be 
used  with  antibiotics  in  combination  therapy1.  Several  experimental 
approaches  to  LeTx  neutralization  based  on  inhibition  of  cellular  LF 
uptake  have  shown  efficacy  in  animal  models5,6. 

LF  is  a  zinc-dependent  metalloproteinase  that  cleaves  most  MAP 
kinase  kinase  (MKK)  enzymes  at  sites  near  their  N  termini'""10. 
Cleavage  impairs  the  ability  of  the  MKK  to  interact  with  and  phospho- 
rylate  its  downstream  MAP  kinase  substrates  by  disrupting  or  remov¬ 
ing  a  docking  site  known  as  the  D-domain11.  Inhibition  of  MAP  kinase 
pathways  by  LF  impairs  dendritic  cell  and  macrophage  function  and 
may  help  to  establish  infection9,12.  Higher  levels  of  toxin  are  cytotoxic 
specifically  to  macrophages  and  probably  contribute  to  fatality  later  in 
the  course  of  the  disease1,2,13,14.  Although  the  mechanisms  bv  which 
MKK  cleavage  leads  to  macrophage  cell  death  are  not  entirely  known, 
p38  family  MAP  kinases  seem  to  be  required  for  survival  of 
macrophages  upon  activation  by  bacterial  endotoxins1 1?. 

Efficient  cleavage  of'  MKKs  requires  interaction  between  an  LF 
exosite  that  has  not  yet  been  characterized  and  a  region  in  the  MKK 


catalytic  domain  distal  from  the  cleavage  site16.  However,  mutation  of 
residues  surrounding  the  scissflc  bond  in  MKKs  abolishes  proteolysis, 
indicating  that  cleavage  site  recognition  is  also  crucial  to  substrate 
selection  by  LF7J\  Accordingly,  LF  can  cleave  short  peptides,  and  effi¬ 
cient  substrates  have  been  generated  based  on  a  consensus  motif 
derived  from  MKK  cleavage  sites17"19.  It  is  not  clear,  however,  which 
positions  surrounding  the  cleavage  site  are  most  critical  for  efficient 
catalysis,  nor  whether  residues  found  in  M  KKs  are  optima)  for  cleav¬ 
age  by  LF.  Such  information  is  important  for  the  design  of  therapeuti¬ 
cally  useful  small  molecule  LF  inhibitors,  as  thus  far  only  rather  long 
(more  than  ten  residues)  peptide  hydroxamates  have  been  reported  to 
specifically  inhibit  IF19.  Here  we  take  an  unbiased  approach  to  the  dis¬ 
covery  of  LF  substrates  and  inhibitors  by  selection  from  random  pools 
of  millions  of  peptides,  and  report  the  crystal  structures  of  LF  in  com¬ 
plex  with  optimized  substrates  and  small  molecule  peptide-based 
inhibitors. 

RESULTS 

Determination  of  the  optimal  peptide  cleavage  motif  for  LF 

To  gain  insight  into  substrate  recognition  by  LF  and  to  facilitate  the 
development  of  LF  inhibitors,  we  applied  a  mixture-based  peptide 
library  approach  that  produces  extended  cleavage  site  motifs  for  pro¬ 
teases20,21.  Initially  we  prepared  a  partially  degenerate  peptide  mix¬ 
ture,  acetyl  -  K  K KPTPXXXXX  A  K  (See  Table  1  for  explanation  of 
nomenclature),  in  which  we  fixed  six  positions  with  the  residues  found 
N-tcrminal  to  the  LF  cleavage  site  in  MKK-1  and  followed  them  by  a 
number  of  degenerate  positions.  Partial  digestion  of  the  library  with 
LF  followed  by  Edman  sequencing  of  the  mixture  provided  the 
specificity  for  the  positions  C- terminal  to  the  cleavage  site  (Table  1), 
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Medical  School,  200  Longwood  Avenue,  3oston,  Massachusetts  02115,  USA.  5These  authors  contributed  equally  to  this  work.  Correspondence  should  be  addressed 
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Table  1  LF  cleavage  site  specificity  and  cleavage  sites  of  known  protein  substrates 


P6 

P5 

P4 

Cleavage  position 

P3  P2  PI  Pi' 

P2' 

P3' 

P4' 

Consensus 

R  (2.1) 

K 

(2.0) 

K  (2.0) 

vu.sy 

Y  (3.1) 

P 

Y  (3.0) 

P  (1.9) 

N  (1.4) 

E (1.5) 

£  (2.1) 

R 

n.9) 

R  (1.9) 

pu.sr 

R  (1.6) 

L (2.2) 

0(1.4) 

M  (1.3) 

A  (1.5) 

K  (1.7) 

S 

(1.7) 

H  (1.6) 

FU.4J- 

F  (1.4) 

1(2,1) 

RU.4) 

H  (1.4) 

H 

(1.5) 

S  (1.4) 

A  (1.4)* 

L (1.3) 

M(i.8) 

K (1.3) 

r  (3.8) 

G {1.3} 

V (1.4) 

MKK-1 

K 

K 

K 

P 

T 

P 

1 

G 

L 

N 

MKK-2 

R 

K 

P 

V 

L 

P 

A 

L 

T 

1 

MKK-3 

R 

K 

K 

D 

L 

R 

1 

S 

C 

M 

MKK-4 

K 

R 

K 

A 

L 

K 

L 

N 

F 

A 

MKK-4 

F 

K 

S 

T 

A 

R 

F 

T 

t 

N 

MKK-6 

R 

N 

P 

G 

L 

K 

1 

P 

K 

E 

MKK- 7 

P 

R 

P 

T 

L 

Q 

L 

P 

L 

A 

MKK-7 

P 

R 

H 

M 

L 

G 

L 

P 

S 

T 

Portions  surrounding  the  scissiie  bend  are  defined  as  (...P3-P2-Pi-Pr*P2'-P3\..}  where  cleavage  occurs  between 
the  PI  and  PI' residues.  Top:  LK  selectivity  a?  determined  using  The  peptide  libraries  acetyl- KKKPTPXXXXXAK  (for 
the  Pl'-P4'  positions)  and  MXXXXXPVPMECK(K-biotin)  (for  the  P6-P2  positions).  Selectivity  values  were  determined 
by  dividing  the  molar  amount  cf  a  given  residue  within  a  sequencing  cycle  by  the  average  molar  amount  of  all  residues 
within  that  cycle,  so  liial  a  value  of  \  is  average  and  would  thus  indicate  no  selectivity.  Only  positive  selections  of 
2*1.3  are  shown.  Values  at  the  P3  position  marked  v/ith  an  asterisk  reflect  the  proportional  increase  of  that  residue 
from  the  previous  cycle.  Bottorr.:  Residues  present  at  positions  surrounding  the  IP  cleavage  sites  m  MKK  proteins. 


To  obtain  selectivity  information  for  sites  N-tcrminal  to  the  scissiie 
bond,  we  constructed  a  secondary  library,  MXXXXXPYPMEDK 
(K-biotin),  in  which  we  fixed  the  residues  most  highly  selected  by  LF  at 
the  primed  positions.  We  also  fixed  proline  at  the  Pi  position,  as  an 
, MKK-1  mutant  bearing  alanine  at  this  position  is  not  cleaved  by  LF'. 
Partial  cleavage  of  this  library  was  followed  by  removal  of  the  undi¬ 
gested  peptides  and  C-terminal  fragments  with  immobilized  avidin. 
Sequencing  of  the  N-tcrminal  fragments  subsequently  provided  the 
specificity  for  LF  at  the  unprimed  positions  (Table  1 ). 

LF  seems  to  be  most  selective  at  the  Pi'  position  (immediately 
C- terminal  to  the  scissiie  bond),  where  the  enzyme  requires  a 
hydrophobic  amino  acid,  and  can  accommodate  both  aliphatic  and 
aromatic  residues.  Other  features  of  the  motif  include  a  general  selec¬ 
tion  lor  hydrophobic  residues  at  the  P2  position  and  an  unusual  selec¬ 
tivity  for  basic  residues  at  multiple  positions  N- terminal  to  the 
cleavage  site.  Notably,  sequence  comparisons  and  mutagenesis  studies 
have  indicated  that  at  least  two  basic  residues  and  a  downstream  <f>X<J> 
sequence  (where  <J>  indicates  a  hydrophobic  amino  acid  and  X  any 
amino  add)  are  es-semial  features  of  D-domains  for  mediating  inter¬ 
actions  with  MAP  kinases22-24.  This  similarity  provides  an  evolution¬ 
ary  rationale  for  the  targeting  of  these  particular  sites  within  the  MKKs 
by  LF;  adaptive  mutations  in  MKKs  that  would  render  them  uncteav- 
able  would  necessarily  produce  nonfunctional  enzymes,  thus  making 
the  acquisition  of  anthrax  resistance  unlikely. 

Although  general  features  of  the  selected  consensus  LF  cleavage 
motif  arc  reflected  in  the  residues  surrounding  the  cleavage  sites 
within  the  MKKs  (Table  1),  specific  aspects  of  the  motif,  such  as  the 
selection  of  tyrosine  over  other  hydrophobic  residues  at  the  Pi'  posi¬ 
tion,  could  not  have  been  predicted  based  on  consideration  of  known 
cleavage  sites.  Accordingly,  a  ten-residue  peptide  based  on  the  consen¬ 
sus  cleavage  site  (LF10)  is  cleaved  -50-fold  more  efficiently  than  an 
analogous  MKK-1  cleavage  site-spanning  peptide  (Table  2).  We  fur¬ 
ther  substantiated  the  library  selections  by  preparing  additional  pep¬ 
tides  with  alanine  substitutions  at  various  sites  within  the  consensus. 
In  each  case,  the  substitution  led  to  a  substantial  decrease  in  cleavage 


efficiency  (Table  2).  An  extended  15-residue 
consensus  peptide  (LF15)  provided  a  marked 
increase  in  cleavage  efficiency  over  LF10* 
while  maintaining  favorable  spectral  proper¬ 
ties  (an  eight-fold  increase  in  fluorescence 
upon  exhaustive  cleavage).  This  peptide  has 
the  highest  specificity  constant  of  any  LF  pop- 
tide  substrate  thus  far  reported17-19,  allows 
detection  of  very  low  quantities  of  LF,  and 
should  therefore  be  useful  in  high- 
throughput  screens  for  LF  inhibitors. 

Evaluation  of  peptide-based  LF  inhibitors 

Substrate -derived  inhibitors  for  metal lo 
proteinases  have  been  produced  by  incorpo¬ 
rating  a  metal -chela ting  group  either  to  the 
C  terminus  of  a  peptide  corresponding  to  the 
unprimed  positions,  or  to  the  N  terminus  of  a 
peptide  covering  the  primed  positions25,26.  As 
LF  has  substantial  selectivity  on  either  side  of 
the  scissiie  bond,  we  prepared  both  types  of 
inhibitors  and  tested  them  for  their  ability  to 
inhibit  cleavage  of  the  consensus  peptide  by 
LF.  As  in  a  previously  reported  study19,  we 
found  that  a  relatively  long  C-terminal  pep¬ 
tide  hydroxamate  is  a  potent  LF  inhibitor, 
whereas  short  peptide  analogs  such  as  acetyl -KVYP-hydroxamate 
inhibit  the  enzyme  poorly  (Table  3).  Conversely,  measurable  inhibi¬ 
tion  was  found  w'ith  a  small  compound  incorporating  primed  side 
residues,  2-thioacetvl-YPM-amide  (SHAc-YPM,  Table  3).  This 
compound  bears  an  N-terminal  metal  chelating  group  followed  by 
a  hydrophobic  residue  at  the  PI'  position,  an  arrangement  shared 
by  compounds  previously  reported  to  inhibit  matrix  metallo- 
proteinases  (MMPs)27,28.  This  relationship  prompted  us  to  test 
•several  similar  MMP  inhibitors  for  potency  against  LF.  One  such 
compound,  GM60O1  (3  -  ( N-hydroxycar  boxarn  i  do)  -2-isobutyl- 

propanoyl-Trp-mcthylamide)29,  an  N-terminal  hydroxamie  acid  with 
a  Pi'  leucine  mimetic,  a  P2'  tryptophan  and  a  C-terminal  methyl 
group,  inhibited  LF  more  potently  than  did  the  other  compounds 
tested  (Table  3  and  data  not  shown).  The  enhanced  potency  of 
GM60QI  over  SHAc-YPM,  despite  the  presence  of  predicted  subop  ti - 
mal  residues,  is  probably  attributable  to  the  favorable  substitution  of 
the  hydroxamie  add  moiety  for  the  thioacetyl  group28,30. 


Table  2  Catalytic  parameters  for  cleavage  of  substrate  peptides  by  LF 


Peptide 

Sequence 

t  Km  (M~*  S'1) 

MKK-1 

Mca-KKPTPIQLN-Dnp 

2, 500  ±800 

LF10 

Mca-KKVYPYPME-Dnp 

130,000  ±  20,000 

LF10-P5  Ala 

Mea-AKVYPYPME-Dnp 

7  500  ±500 

LF10-P2  Ala 

Mca-KKVAPYPME-Dnp 

60.000  ±  10,000 

IFiO-Pl'Ala 

Mca-KKVYPAPMt-Dnp 

22.000  ±  2.000 

LF15 

Mca-RRKKVYPYPME-Dnp-TIA 

4  x  107  ±  1  xlO7 

Residues  in  bold  indicate  substitutions  to  the  consensus  peptide.  Substrate  peptides 
contain  N-terminal  Mca  (7-methoxycoumarin-4-acctyl)  fluorescent  groups  and  Dnp 
(2,4-dinitrophenyldiaminopropionic  acid)  quenching  residues  C-terminal  to  The 
cleavage  site,  allowing  reaction  progress  to  be  followed  fluoro metrically  by  observing 
the  increase  in  coumarin  fluorescence  upon  cleavage  (excitation  325  nm,  emission 
393  nm).  For  ell  peptides  except  LF15,  the  was  determined  by  measuring  the 
cleavage  rate  at  J  jiM  peptide  (where  tSJ  «  Hm,  (SJ  represents  concentration  of 
substrate).  For  the  LF15  peptide,  k(:M  (3.4  s  Land  Xm (35  nM)  determined 
individually  by  measuring  the  initial  rate  at  various  peptide  concentrations.  Values 
reflect  the  average  of  three  separate  determinations  ±  s.a. 


NATURE  STRUCTURAL  &  MOLECULAR  BIOLOGY  VOLUME  1  1  NUMBER  1  JANUARY  2004 


61 


©  2004  Nature  Publishing  Group  http://www.nature.com/natstructmolbiol 


ARTICLE^ 


Table  3  Potency  of  peptide-based  LF  inhibitors 


Compound 

#i*Pp  (pM) 

Acetyl-  KVYP-ftydroxarr.ate 

>100 

P  LG -hydroxamate 

>100 

MKARRKKVYP-hydrexamate 

0.001 1  ±  0.0002 

SHAc-YPM 

11  ±3 

GM6001 

2.1  ±0.2 

K*?p  values  were  determined  by  measuring  inhibition  of  peptide  cleavage  U  jiM  LFi5 
for  the  10-nvar  hydroxamate  or  1  pM  IF  10  for  ail  other  compounds)  over  a  range  of 
inhibitor  concentrations.  Values  are  the  mean  ±  s,d.  of  three  separate  determinations, 
each  done  in  triplicate. 


Both  SHAc-YPM  and  GM6001  inhibited  cleavage  of  MKK  proteins 
by  LF  in  vitro  with  potency  comparable  to  their  ability  to  inhibit  cleav¬ 
age  of  the  peptide  substrate  (Fig.  la  and  data  not  shown).  GM6001 
also  partially  inhibited  cleavage  of  MKKs  in  a  LeTx -treated 
macrophage  cell  line  (Fig.  lb).  Notably,  LF  inhibition  by  GM6001  in 
cultured  cells  was  sufficient  to  protect  them  from  LeTx-indnced  cell 
death  (Fig.  lc,d).  Neither  the  thioacetyi  compound  nor  the  long 
C-terminal  peptide  hvdroxamatc  was  active  in  cell  culture,  presumably 
owing  to  poor  cell  permeability  or  metabolic  instability  (data  not 
shown).  We  also  found  that  the  inhibitory  potency'  of  the  C- terminal 
peptide  hydroxamate  (but  not  that  of  any  of  tire  other  compounds) 
was  substantially  poorer  when  evaluated  at  physiological  salt  concen¬ 
trations,  which  are  much  higher  than  for  standard  assay  conditions  for 
l.F  hi  vitro  (data  not  shown).  GM6001  could  also  prevent  cell  death 
when  added  as  late  as  3  h  after  LeTx,  suggesting  that  it  cun  protect  cells 
subsequent  to  internalization  of  the  toxin  (Fig.  le).  'These  results  indi¬ 


cate  that  small  molecule  metalloproteinase  inhibitors  provide  a  means 
to  neutralize  the  biological  activity  of  anthrax  toxin. 

Structures  of  LF  in  complex  with  peptides  and  inhibitors 

To  understand  the  molecular  basis  for  substrate  selectivity  by  LF  and 
to  guide  further  inhibitor  design,  we  solved  the  X-ray  crystal  structures 
of  LF  in  complex  with  a  consensus  peptide,  LF20  (both  in  a  zinc-free 
state  and  in  an  active  site  mutant  with  zinc),  and  with  two  of  the 
inhibitors  reported  here,  GM600I  and  SHAc-YPM,  both  in  the  pres¬ 
ence  of  zinc  (Fig.  2a-c  and  Table  4).  Crystals  soaked  in  the  MKAR- 
RKKVYP  C- terminal  hydroxamate  showed  additional  electron  density' 
around  the  active  site,  but  this  was  not  interpretable  as  a  single  atomic 
model. 

The  LF20  peptide  ( M  L  A  RR  K  K  V  YP  YPM  EPT1  AEG  -  a  mid  e)  incor¬ 
porates  consensus  residues  (P5-P4')  surrounding  the  scissitc  bond 
based  on  the  peptide  library  screen,  flanked  by  residues  of  authentic 
MKK2.  In  the  crystal  structure  of  the  zinc-free  LF20  complex,  nine 
peptide  residues  ( from  the  P3  valine  to  the  P6'  threonine)  are  defined 
by  electron  density;  in  the  zinc-bound  active  site  mutant,  the  peptide 
lies  in  the  same  location,  and  a  further  two  residues  at  the  N  terminus 
are  visible  (lysines  P5  and  P4);  whereas  residues  downstream  of  the 
cleavage  site  are  in  general  less  welt  defined,  suggestive  of  partial  cleav¬ 
age.  The  peptide  binds  in  an  extended  conformation,  along  the 
40  A-long  substrate  recognition  groove  (formed  by  domains  U-JV) 
that  was  previously  defined  by  soaking  an  MKK2-derived  peptide  into 
LF  crystals31  (Fig.  2a,d,c).  However,  the  present  complex  structure  is  at 
substantially  higher  resolution  than  that  of  the  earlier  study,  and,  as 
expected,  the  LF20  binds  more  strongly  than  the  MKK2  peptide.  The 
new  crystallographic  data  unequivocally  demonstrate  that  the  binding 
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Figure  1  Inhibition  of  LF  by  GM6001. 

(a)  GM6001  inhibits  cleavage  of  MKKs  by  LF 
in  vitro.  Immunoblots  show  LF  cleavage  of  MKK-3 
and  MKK-1  in  J774A.1  lysates  in  the  presence  of 
varying  concentrations  of  GM6001  or  10  mM 
o-phenanthrcline,  a  metal  chelator.  Cleavage  of 
MKK-3  causes  a  mobility  shift;  the  MKK-1 
antibody  is  directed  against  the  N  terminus  and 
does  not  react  with  the  cleavage  product, 
resulting  in  disappearance  of  the  band  upon 
cleavage,  (b)  GM6001  inhibits  MKK-3  cleavage 
in  lethal  toxin-treated  cells.  Quantified  western 
blot  analysis  of  MKK-3  cleavage  in  J774A.1 
treated  with  lethal  toxin  (0.5  pg  ml'1  PA  with  the 


indicated  concentrations  of  LF)  in  the  absence  or  presence  of  100  pM  GM6001 .  (c)  Protection  of  J774A.1  cells  from  lethal  toxin-mediated  cell  death  by 
GM6001.  Cell  viability  as  determined  by  MTT  assay  after  lethal  toxin  treatment  in  the  presence  of  100  pM  GM6001  or  0.2%  (v/v)  DMSO  carrier,  (d)  Dose- 
dependent  neutralization  of  lethal  toxin  by  GM6001.  J774A.1  celi  viability  determined  by  MTT  assay  after  treatment  with  lethal  toxin  (0.5  pg  ml"1  PA  +■ 
0.3  pg  ml"1  LF)  or  PA  alone  (0.5  pg  ml"1)  in  the  presence  of  the  indicated  concentrations  of  GM6001 .  (e)  GM6001  protects  J774A,  1  cells  when  added 
subsequent  to  LeTx.  Cell  viability  is  shown  after  treatment  with  PA  alone  (0.4  pg  ml"1)  or  PA  with  LF  (25  ng  ml"1),  with  GM6001  added  to  100  p  M  at  the 
indicated  time  after  toxin  addition. 
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Figure  2  Structures  of  LF  in  complex  v/ith 
peptides  and  inhibitors.  Molecular  surface  of  IF  is 
colored  by  charge  (red,  negative;  blue,  positive), 
with  Zn2+  as  a  solid  sphere  (cyan)  and  the  model 
of  the  peptide  or  inhibitor  in  ball-and-stick 
representation.  The  individual  electron  density 
surrounding  each  molecule  is  a  2F0-  Fc  difference 
map  calculated  at  the  respective  final  resolution 
and  contoured  at  1 .0  a.  (a)  LF20  (yellow)  in  the 
absence  of  Zn7+,  resolution  limit  2.85  A.  The 
model  of  bound  LF20  shows  the  sequence 
VYPYPMEPT  (residues  8-16  of  the  20-residue- 
long  LF20).  This  is  the  ordered  region,  and  the 
electron  density  is  clearly  visible  in  difference 
maps  (2F0  -  Fc  and  F0  -  Fc)  calculated  from  crystal 
X-ray  diffraction  data.  (b,c)  SHAc-YPM  (white, 
labeled  YPM),  resolution  limit  3.50  A,  and 
GM6001  (green),  resolution  limit  2.70  A. 
respectively.  Continuous  electron  density  extends 
from  the  zinc  atom  to  the  metal-chelating  moieties 
of  the  inhibitors  (hydroxamate  and  thioacetyl, 
respectively),  (d)  The  superposed  individual 
complex  structures  of  all  three  target  molecules 
from  a-c  in  the  substrate- bind ing  groove  of  LF, 
using  the  surface  calculated  for  LF-LF20,  The 
targets  are  all  bound  in  the  same  N-to-C  peptide 
orientation,  (e)  An  overview  of  LF  bound  to  the 
targets  LF20,  GM6001  and  SHAc  -YPM, 
superposed  and  colored  as  in  d.  The  molecular 
surface  was  calculated  from  the  LF-LF20  complex. 
The  domains  in  LF  are  labeled  MV.  The  catalytic 
site  is  in  domain  IV,  where  the  zinc  atom  (not  shown 
in  this  figure)  is  bound.  These  figures  were  prepared 
using  SPOCK  (http://mackereLtamu.edu/spcck/). 


mode  conforms  to  the  canonical  thermolysin  substrate -binding 
mode32.  The  LF20  peptide  is  bound  in  a  productive  conformation,  in 
contrast  to  that  previously  inferred  from  the  LT-MKK2  structure31, 
where  the  peptide  is  bound  in  a  nonproductive  mode  (the  reverse  ori¬ 
entation  and  6  A  distant  from  the  active  site).  Therefore,  the  new  com¬ 
plex  structures,  Protein  Data  Bank  (PDB)  entries  1PWV  and  1PVVW, 
supersede  PDB  entry  I  ]KY. 

The  ordered  sequence  of  LF20  binds  closely  to  the  LF  main  chain 
and  secondary  structures  surrounding  the  catalytic  zinc-binding  site. 
’The  P5  and  P4  lysine  residues  lie  dose  to  a  strongly  acidic  patch  at  the 
entrance  tc  the  active  site,  rationalizing  the  preference  for  basic 
residues  at  multiple  positions  upstream  of  the  cleavage  site.  Residues 
P3-P1  form  amiparallel  p-sheet-like  interactions  with  strand  4(33  of 
LF.  The  P2  tyrosine  side  chain  occupies  a  fairly  narrow  hydrophobic 
pocket;  this  may  explain  the  preference  for  tyrosine  at  this  site.  The  PT 
tyrosine  residue  is  buried  within  a  deep  hydrophobic  ST  pocket  in  LF, 
adjacent  to  the  active  site  center.  The  pocket  expands  substantially  on 
binding  peptide  (induced  fit),  including  a  — 3.5-A  shift  of  the  main 
chain  at  Glu676  at  the  bottom  of  the  pocket.  Additionally,  there  is  a 
-3.0-A  shift  of  the  side  chain  of  Phe329>  which  is  positioned  along  the 
substrate  recognition  groove,  in  close  proximity  to  the  active  site  and 
the  bound  peptide  (this  is  also  seen  for  all  other  bound  ligands).  The 
depth  and  plasticity  of  the  ST  cavity  presumably  allow  the  enzyme  to 
accommodate  large  hydrophobic  residues  at  the  PT  position;  this 
explains  why  LF  is  most  selective  at  this  site. 

The  SHAc-YPM  inhibitor  shares  three  residues  with  the  LF20  pep¬ 
tide  downstream  of  the  cleavage  site,  and  the  corresponding  peptide 
electron  density  and  derived  model  are  markedly  similar,  with  the  PT 
tyrosine  buried  in  die  ST  pocket  (Fig.  2b,d,e).  The  thioacetyl  moiety 


was  modeled  in  a  bidentate  conformation33'34  with  the  carbonyl  oxy  ¬ 
gen  atom  and  thiol  sulfur  atom  directed  toward  the  zinc.  For  the 
L  F  ( E6 87 C  )-GM6 00 1 -2 n 2 +  complex  (Fig.  2c-e),  where  LF(E687C) 
represents  the  LF  E6S7C  mutant,  the  peptide  binds  in  a  similar  loca¬ 
tion.  We  modeled  the  hydroxamate  moiety  in  the  conventional  biden¬ 
tate  planar  conformation27*32,33*35' 37>  with  the  carbonyl  and  hydroxyl 
oxygen  atoms  directed  toward  the  zinc.  The  PT  side  chain  is  a  leucine 
mimetic  and  binds  in  the  ST  pocket.  The  smaller  side  chain  induces 
correspondingly  less  expansion  of  the  ST  pocket.  The  tryptophan  side 
chain  at  the  P2'  position  makes  no  sped  tic  contacts  with  the  protein, 
suggesting  that  it  does  not  contribute  to  specificity. 

DISCUSSION 

The  three  independent  LF-complex  structures  reported  here  indicate 
several  common  features  essential  for  optimized  substrate  and 
inhibitor  binding.  The  long  hydrophobic  substrate-binding  groove 
and  deep  SI'  pocket  adjacent  to  the  catalytic Zn2+-binding site  seem  to 
be  the  main  determinants  for  strong  target  affinity.  This  strong 
hydrophobic  selectivity  has  also  been  indicated  by  experimental  data 
from  nonpeptidic  small  molecule  drug  library  screens  of  Panchal 
ex  al?z  (this  issue).  These  structures  will  enable  the  design  of  com¬ 
pounds  with  greater  complementarity  to  die  ST  pocket  and 
substrate  recognition  groove,  combined  with  metal  chelating  groups 
spaced  appropriately  to  allow  for  highly  potent  inhibition  of  LF. 

Given  the  success  of  protease  inhibition  in  the  treatment  of  cardio¬ 
vascular  disease  and  AIDS,  small  molecule  LF  inhibitors  would  seem 
to  be  the  most  likely  source  for  new  drugs  to  treat  anthrax.  The 
possibility  of  encountering  cither  naturally  occurring  or  engineered 
antibiotic-resistant  strains  suggests  that  the  availability  of  such 
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Table  4  Data  collection  summary  for  LF-complex  crystals 


LF-LF20 

LF(E687C)-LF20-Zn 

LF-SKAc-YPM-Zn 

LF(E687C)-GM6001-Zn 

Data  collection 

Space  group 

Cell  dimensions  (A) 

P2 1 

*2| 

n  i 

a 

96.70 

96.70 

96.70 

96.70 

b 

137.40 

137.40 

137.40 

137.40 

c 

98.30 

98.30 

98.30 

58.30 

Wavelength  (A) 

1.07 

0.98 

1.08 

0.97 

Resolution  range  (A) 

50.0-2.85 

30.0-2.80 

30.0-3.50 

50.0-2.70 

Tola!  reflections 

96,701 

94,088 

91,831 

255,861 

Unique  reflections 

55.398 

54.931 

28,731 

72,275 

Completeness  (%)3 

92.2  (90,0) 

86.8(76.0) 

50.8(84.9) 

99.6(98.8) 

r  r%>ob 

"sym  ’■  °! 

10.5(48.6) 

6.6(40.9) 

15.9(45.1) 

8.3  (48.0) 

Ifo!* 

6.7  (1.4) 

12.2(2.2) 

7.4  (2.5) 

15.6(2.5) 

Refinement  statistics 

23. 1 

23.0 

23.2 

23.0 

)b< 

28.3 

27.7 

29.5 

26.8 

•Values  in  parentheses  are  for  the  highesi-resolution  shell.  lRv>,n  ^  HI  -  d>t Z<!>.  where  /  is  Ihe  observed  intensity  arid  <J>  is 
the  average  intensity  from  multiple  observations  ot  symmetry  related  reflections. c/?- factor  =  111^1  -  jrcll  /  z  l.rct; 
represents  reflections  not  in  ft,!M  set;  /?„„  represents  5%  ol  a  random  selection  of  data  not  used  during  refinement. 


compounds  would  be  crucial  in  minimizing  potentially  large  numbers 
of  deaths.  The  work  described  here  creates  many  paths  toward  the 
production  of  such  drugs,  both  by  enabling  the  rapid  screening  of 
chemical  libraries  and  by  providing  a  structural  basis  for  rational  drug 
design.  Our  results  suggest  in  particular  that  sizable  libraries  of  MMP 
inhibitors  already  in  existence  are  likely  to  contain  additional  Id- 
inhibitors,  perhaps  with  increased  potency  and  specificity.  This  work 
also  illustrates  the  utility  of  peptide  libraries  for  both  the  rapid  opti¬ 
mization  of  substrate  peptides  and  the  generation  of  lead  compounds. 
Such  methods  should  be  generally  applicable  to  any  protease  of  inter¬ 
est  as  a  therapeutic  target. 

METHODS 

Peptide  library  methods.  Cleavage  site  selectivity  for  LF  was  determined  by 
modification  of  described  methods'1.  Libraries  were  custom  synthesized  at  the 
Tufts  University  Core  Facility  (Boston).  Degenerate  positions  (*X*)  were  pre¬ 
pared  using  isokinetic  mixtures  to  produce  equimolar  amounts  of  the  19  pro- 
teogenic  amino  acids  excluding  cysteine.  For  determination  of  the  printed  side 
selectivity,  the  library  acetyl-KKK PTPXXXXXA  1C  (I  mM)  was  digested  with 
LF-1'  to  5-10%  completion  in  a  10  pi  reaction  containing  20  mM  HEPES, 
pH  7.4,  100  mM  NaCl.  The  reaction  products  were  analyzed  by  N -terminal 
peptide  sequencing  on  an  Applied  Biosystems  Precise  494  automated  Edman 
sequencer.  To  determine  the  unprimed  side  selectivity,  the  library 
M XXXXX  P  YP  M£  D  K  ( K-biotin )  (20  p!  at  1  mM)  was  digested  to  5%  comple¬ 
tion  as  above,  and  quenched  by  adding  an  equal  volume  of  10  mM 
o-phonanthroline.  The  reaction  products  were  incubated  in  batches  with  500  fil 
avidin  agarose  (Sigma)  in  500  ft!  of  25  mM  ammonium  bicarbonate  with  tum¬ 
bling  for  1  h,  at  which  lime  the  slurry  was  transferred  to  a  column.  The 
flowthrough  and  wash  were  combined,  evaporated  under  reduced  pressure  and 
analyzed  by  Ed  man  sequencing  as  described  above. 

Peptide  cleavage  assays.  All  peptides  were  synthesized  at  the  Tufts  University 
Core  Facility  except  C-terminal  peptide  hydroxa mates  (Genemed  Synthesis). 
Concentrations  were  determined  based  on  the  absorbance  of  the  conmarin 
group  (%28  -  12,900  M  1  cm"1)  for  the  peptides  and  on  tyrosine  absorbance 
few)  -  1  >200  M-1  cm1)  for  the  inhibitors.  Peptide  cleavage  assays  were  carried 
out  in  a  Molecular  Devices  Spectramax  Gemini  XS  fluorescence  plate  reader  in 
black  96- well  plates  using  LFI0  digested  to  completion  (which  results  in  a 
12-fold  increase  in  fluorescence)  as  a  standard.  Reactions  were  run  at  25  °C  in 


_  20  mM  HEPES,  pH  7.4,  0.1  mg  ml  1  BSA  (plus  1 

mM  DTT  for  assays  of  the  thioaeetyl  inhibitor  or 
...  C<-GM6001-Zn  o.01%  (v/v)  Brij  35  for  assays  of  the  ten-residue 

hydroxamnte  inhibitor).  For  kCil  i  Km  determina- 
pj  tions,  LF  was  used  at  2-20  nM  and  the  rates  were 

determined  from  die  linear  range  of  the  reaction 
_0  progress  curve  (<  10%  substrate  turnover).  For  the 

LF15  peptide,  rates  were  determined  in  o  con  tin  u- 
*  ous  assay  at  varying  substrate  concentrations  using 

a  Photon  Technology  International  Fluorescence 
L'  "j/  system  using  2  nM  LF  under  the  conditions 

.  /  c  described  above,  using  t lie  peptide  at  1  pM  digested 

255,861  to  completion  (eight-fold  increase  in  fluorescence) 

72,27  5  as  a  standard.  Data  were  corrected  for  the  inner  fil- 

99.6  (98.8)  rer  effect  by  measuring  the  quenching  of  an  Mca- 

8.3  (48.0)  peptide  standard  at  each  substrate  concentration. 

15.6  (2.5)  Data  were  fitted  directly  to  the  Michaelis-Menten 

equation.  Peptide  cleavage  sites  were  confirmed  by 
Edman  sequencing  of  the  reaction  products. 

23.0 

25  8 _  Analysis  of  MK K  cleavage.  For  in  vitro  MKK  cleav- 

^bierverf  intensity  and  </>  is  j774A'1  ccI,s  werc  ty**  in  0.5%  (v/v)  Igepal 

-  ircii  /  z  i.rci;  r^  CA-630,  20  mM  HEPES,  pH  7.4,  100  mM  NaCl, 

during  refinement.  ,  mM  DTT>  5%  (v/v)  glycerol,  1  mM  PMSF,  and 

4  pg  ml”1  each  of  leupeptin,  pepstatin  and  apro- 
tinin.  LF  was  preincubated  for  30  min  at  25  °C  with 
varying  concentrations  of  inhibitor  before  the  addition  of  J774A.1  cell  lysate. 
After  an  additional  30  min  the  reaction  was  quenched  by  adding  SDS-PAGE 
loading  buffer.  To  analyze  cleavage  in  cultured  cells,  {774A.1  cells  in  six- well 
plates  were  pretreated  with  GM600 1  (CA  LB  IOC  HEM)  or  DMSO  carrier  alone 
(0.2%  (v/v)  final  concentration  in  complete  media)  for  30  min  at  37  GC  before 
adding  PA  (to  0.5  gg  mb1}  and  I.F  (to  the  indicated  concentration).  Cells  were 
incubated  at  37  'C  fer  an  additional  90  min,  washed  once  with  PBS  and  then 
lysed  directly  in  SDS-PAGE  loading  buffer  (100  pi  per  well)  and  boiled  10  min. 
Samples  were  fractionated  by  SDS-PAGE  and  transferred  to  PVDF  membrane 
for  iirimunoblotting  with  anti-MKK-3  (Santa  Cruz  Biotechnology  C-19)  or 
anti  -  MKK -1  N  terminus  (Upstate  Biotechnology,  catalog  no.  06-269).  MKK- 3 
cleavage  was  quantified  using  N1H  Image  software  (http://rsb.info.nih.gov/ 
nih-image/). 

Lethal  toxin  assays.  J774A.  I  cells  were  plated  in  96-\ve!I  dishes  at  3  x  !05  cells 
per  well  and  allowed  to  recover  for  16  h,  after  which  the  medium  was  removed 
and  replaced  with  fresh  complete  medium  (100  pi  per  well)  containing  the 
indicated  concentration  ofGMdOOi  or  carrier  alone  (0.2%  (v/v)  DMSO).  After 
30  min,  FA  and/or  LF  were  added  to  the  indicated  concentrations  and  incuba¬ 
tion  continued  for  an  additional  4  h.  To  assay  viabil  ity,  10  pi  of  5  mg  ml  1 MTT 
in  FBS  was  added  to  each  well,  and  incubation  was  continued  for  2  h  before 
aspirating  the  supernatant  and  extracting  with  0.1  M  HC1  in  isopropanol. 
Absorbance  at  570  itm  with  a  background  correction  at  690  nm  was  deter¬ 
mined  in  an  absorbance  plate  reader. 

Crystallization.  LF  wild  type  and  E687C  active  site  mutant  protein  crystals 
were  grown  in  1.7  M  (NH4)2$01?  0.2  M  Tris-HCl,  pH  8.0, 2  mM  EDTA  by  the 
hanging- drop  vapor  diffusion  mediod,  at  25  ±  4  °C,  using  a  protein  concentra¬ 
tion  of  1 3  mg  ml* 1  (ref.  3 1 ).  Cocrystals  of  LF  with  GM6001  grew  under  similar 
conditions.  All  crystals  used  are  monoclinic,  in  space  group  P2,,  with  unit  cell 
dimensions  n  =  96.7  A,  b  -  137.4  A,  c=  98.3  A,  a  =  90°,  p  =  98.0°,  y~  and 
contain  two  molecules  per  asymmetric  unit,  in  general,  similar  features  were 
observed  at  the  two  active  sites,  but  the  density  for  Molecule  B  was  stronger. 

LF-substrate  and  LF-inhibitor  complexes.  Native  LF  or  LF  E687C  monoclinic 
F2i  single  crystals  were  harvested  and  bathed  in  several  rounds  of  crystalliza¬ 
tion  buffer  prior  to  soaking  in  their  respective  target  peptide  or  inhibitor  solu¬ 
tions.  Soaks  were  done  at  room  temperature,  23  °C  ±  2  °C.  The  treated  crystals 
were  then  individually  flash-frozen  in  liquid  nitrogen,  All  data  was  collected 
was  at  100  K,  in  a  nitrogen  cryostream. 
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The  wild-type  LF-LF20  peptide  complex  was  obtained  by  soaking  crystals  in 
a  solution  of  JO  mM  LF20,  1.8  M  (NH^SO*  0.2  M  Tris-HCI,  pH  8.0,  2  mM 
EDTA  for  8  min.  Each  crystal  was  then  transferred  into  a  cryoprotectant  solu¬ 
tion  of  10  mM  LF20>  2/1  M  (NH4),SOv 0.2  M  Tris-HCI,  pH  8.0, 2  mM  EDTA, 
25%  (v/v)  glycerol,  and  bathed  for  a  further  1  min  before  mounting  in  a  cry- 
uloop  and  flash -freezing.  The  IF(S6S7C)-LF20-Zn31  crystal  complex  was  first 
soaked  in  a  solution  of  1  mM  ZnSO^,  1.8  M  (NH^SO^OJ  M  Tris-HCI,  pH  8.0 
for  5  min,  followed  by  the  treatment  as  described  for  the  wild-type  LF-LF20 
complex. 

TheLF-SHAc-YPM  inhibitor— Zn2 1  complex  was  obtained  by  soaking  crystals 
in  I  mM  Zn$04.  1.8  M  (NH^SO.,,  0.2  M  Tris-HCI,  pH  8.0  for  5  min;  then  in 
5  mM  SHAc-YPM,  1.8  M  (NH^O,,  0.2  M  Tris-HCI,  pH  8.0  for  a  further 
5 min;  and  then  in  5  mM  SHAc-YPM,  2.4  M  (NH^sSO^  0.2  M  Tris-HCI, 
pH  8.0, 2  mM  EDTA,  25%  (v/v)  glycerol  for  1  min  before  mounting  and  fleering. 

The  LF-GM6001  and  LF(  E687C)-GM600 1  inhibitor  complex  crystals  were 
grown  from  a  1:2  molar  ratio  of  LF  to  inhibitor  and  crystallized  as  for  native. 
Crystals  were  soaked  in  1  inM  ZnS04,  1.8  M  tNH4‘)2 SO,t*  0.2  M  Tris-HCI, 
pH  8.0  for  5  min,  then  in  0.1  mM  CM600I  (0.7%  (v/v)  DMSO),  1.8  M 
m<hS04, 0.2  M  Tris-HCI,  pH  8.0  for  2  min,  and  finally  in  0.1  ntM  GM6001 
(0.7%  (v/v)  DMSO),  2.4  M  (NH4)>S04,  0.2  M  Tris-HCI,  pH  8.0,  2  mM  EDTA, 
25%  (v/v)  glycerol  for  <1  min  before  mounting  and  freezing.  Using  a 
LF( E687Q-GM600 1  cocrystal,  the  LF(E687C)-GM600  1-Zn2 '  inhibitor 
complex  crystal  was  also  prepared  wiili  the  method  described  here.  No  substan¬ 
tial  differences  in  target  binding  or  active  site  conformation  between  wild  type 
or  mutant  LF-GM6001-Zn2'  complexes  were  observed  (residue  6S7  is  not 
involved  directly  in  inhibitor  or  zinc  binding).  As  the  LF(£687C)- 
CMbOOl-Zn2*  complex  gave  higher- resolution  data,  this  complex  was  used  in 
further  refinement. 

Data  collection.  Data  for  the  LF-LF20.  LF(E687C)-LF20-Zn2*  and  LF- 
SHAc-YPM  complexes  were  collected  at  the  Stanford  Synchrotron  Radiation 
Laboratory  (SSRL,  Menlo  Park, California,  USA),  on  beamlines  1-5  (wavelength 
-- 1.07  A),  9  1  (wavelength  =  0.98  A)  and  7-1  ( wavelength  ■■■  1.08  A).  Data  for  the 
f..F{ F.687C)~GM600  l~Zn2 1  complex  were  collected  at  the  National  Synchrotron 
light  Source  (NSLS,  Brookhaven,  New  York,  USA)  on  beamline  x!2c  (wave¬ 
length  -  0.97  A).  X-ray  diffraction  data  were  collected  for  I.F-LF20, 
LF(E687C)-l.K?.0-Zn 2 *,  LF-SH Ac- Y PM-Zn2 ' , and  l.F(£687O-GM600l-Zn2 1 
to  resolution  limits  of  2.85  A,  2.80  A,  3.50  A  and  2.70  A,  respectively. 

Data  processing  and  refinement.  Crystallographic  data  were  processed  using 
the  HKL  package10.  Refinement  and  model  building  were  done  in  CNS11  and 
O42.  The  high- resolution  model  of  LF  (PDll  entry  1J7N}'1  wa<  used  as  the 
starting  model.  The  model  was  put  through  rigid  body  refinement  and  then 
minimization,  and  initial  maps  were  calculated.  Additional  electron  density  at 
>1.0  o  in  2 F0  -  F\  and  2  O  in  F0-  F<  maps  was  clearly  seen  in  the  active  site 
groove  of  LF  for  all  cases.  The  model  of  the  peptide  or  inhibitor  with  zinc  was 
then  built  into  this  position  and  further  refined  in  CNS4f).  Difference  maps  of 
the  LF  models,  including  peptide  or  inhibitor,  and  also  omitting  the  peptide  or 
inhibitor,  were  calculated  in  subsequent  rounds  of  model  rebuilding  and  refine¬ 
ment.  Composite  omit  maps  were  also  used.  The  final  ^-factors  for  each  com¬ 
plex  were  as  follows:  LF-LF20  (Zir 1  -  free),  %cc  =  28.3%  and  RW(>t5c  -■  23.1%; 
LF(  £687C)-LF20-Zn2+,  Rflic=  27.7%  and  R  *  23.0%;  LF-SH Ac-YPM-Zn2*, 
RfrCC  =  29.5%  and  R  =  23.2%;  and  LF( E687C>-GM 600 1  -Zn-  * ,  Rfnst*  26.8% 
and  R  ~  23.0%.  Tlie  final  models  fall  within  or  exceed  the  limits  of  all  the  qual¬ 
ity  criteria  of  PROCHECK  from  the  CCP4  suite1:\ 

Coordinates.  Coordinates  and  structure  factors  have  been  deposited  in  the 
Protein  Data  Bank  (accession  codes:  JPWQ,  LF-YPM-Zn*1;  IPVVU, 
LF(E687C)-GM6001-Zn2i;  1PVVV,  LF-LF20;  1PWVV,  LF(£6S7C)-LF20-Zir 
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Anthrax  toxin  consists  of  three  proteins:  Protective  Antigen  (PA),  Lethal  Factor 
(LF)  and  Edema  Factor  (EF)1.  The  first  critical  step  in  the  entry  of  the  toxin  into 
cells  is  the  recognition  by  PA  of  a  receptor  on  the  surface  of  the  target  cell. 
Subsequent  cleavage  of  receptor-bound  PA  enables  EF  and  LF  to  bind  and  form  a 
heptameric  PA«  pre-pore  which  triggers  endocytosis.  Upon  acidification  of  the 
endosome,  PA63  forms  a  pore  that  inserts  into  the  membrane  and  translocates  EF 
and  LF  into  the  cytosol^.  Two  closely  related  host  cell  receptor  molecules,  TEM8 
and  CMG2,  bind  to  PA  with  high  affinity  and  are  required  for  toxicity^A  Here, 
we  report  the  crystal  structure  of  the  PA-CMG2  complex  at  2.5  A  resolution.  The 
structure  reveals  an  extensive  receptor-pathogen  interaction  surface  that  mimics 
the  non-pathogenic  recognition  of  the  extracellular  matrix  by  integrins^.  The 
binding  surface  is  closely  conserved  in  the  two  receptors  and  across  species,  but 
quite  different  in  the  integrin  domains,  explaining  the  specificity  of  the  interaction. 
CMG2  engages  two  domains  of  PA,  and  modeling  of  the  receptor-bound  PA63 


2 


heptamei-6-8  suggests  that  the  receptor  acts  as  a  pH-sensitive  chaperone  to  ensure 
accurate  and  timely  membrane  insertion.  The  structure  will  provide  new  leads  for 
the  discovery  of  anthrax  anti-toxins,  and  will  aid  in  the  design  of  cancer 
therapeutics^. 


Both  TEM8  and  CMG2  contain  a  domain  that  is  homologous  to  the  1  domains  of 
integrins,  which  comprise  a  Rossmann-like  a/p  fold  with  a  “metal  ion-dependent 
adhesion  site”  (MIDAS)  motif  on  their  upper  surface^.  The  PA  monomer  is  a  long 
slender  molecule  comprising  four  distinct  domains.  Two  of  these,  domains  II  and  IV, 
pack  together  at  the  base  of  PA  and  engage  the  upper  surface  of  the  CMG2 1  domain 
surrounding  the  MIDAS  motif  (Fig.  1),  burying  a  large  protein  surface  (1900  A2), 
consistent  with  the  very  high  affinity  (sub-nanomolar  Kd)  of  this  interaction  1 1 .  The  I 
domain  adopts  the  “open”  conformation,  typical  of  integrin-ligand  complexes^  A 
mimics  the  ligand  recognition  mechanism  of  the  integrins^  by  contributing  an  aspartic 
acid  sidechain  that  completes  the  coordination  sphere  of  the  MIDAS  magnesium  ion,  as 
predicted  by  mutagenesis  13,14  (Fig.  2).  This  single  interaction  contributes  substantially 
to  binding,  since  mutation  of  the  aspartic  acid  to  asparagine  completely  eliminates 
toxicity,  as  does  mutation  of  a  metal-coordinating  residue  on  the  receptor. 

However,  the  MIDAS  bond  does  not  explain  the  specificity  of  the  interaction,  as  it 
does  not  distinguish  between  CMG2  and  integrins.  Specificity  arises  from  two  further 
interactions.  First,  PA  domain  IV  docks  onto  the  surface  of  CMG2  adjacent  to  the 
MIDAS  motif.  Domain  IV  comprises  a  (3-sandwich  with  an  immunoglobulin-like  fold, 
but  the  mode  of  binding  is  quite  different  from  antibody-antigen  recognition.  One  of 
the  receptor  loops  (a2-a3)  emanating  from  the  MIDAS  motif  forms  a  hydrophobic 
ridge  that  inserts  into  a  groove  formed  by  one  edge  of  the  p-sandwich  where  its 
hydrophobic  core  is  exposed.  Flanking  this  ridge-in-groove  are  two  further  loops  from 
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CMG2  which  make  a  number  of  specific  polar  interactions  and  salt-bridges  (Figs.  3, 

4a).  Together  with  the  MIDAS  contact,  CMG2  and  PA  domain  IV  bury  1300  A2  of 
surface  area,  a  value  very  similar  to  two  integrin-ligand  interactions,  which  have 
affinities  in  the  sub-micromolar  range^A  CMG2  and  TEM8  are  60%  identical  in  their 
I  domains,  and  homology  modeling  based  on  the  CMG2  structure  shows  that  this  ridge 
is  well  conserved  in  TEM8  and  their  murine  counterparts,  implying  that  they  will  bind 
'  PA  in  an  identical  fashion;  however,  the  structure  and  sequence  of  the  ridge  are  very 
different  in  integrins,  explaining  their  weak  binding. 

The  interaction  between  PA  domain  II  and  CMG2  was  unexpected.  A  P-hairpin 
from  a  well-ordered  loop  (P3-p4)  at  the  bottom  of  domain  II  inserts  into  a  pocket  on  the 
receptor,  burying  600  A2  of  protein  surface  (Fig.  4b).  This  additional  contact 
rationalizes  the  very  high  affinity  of  the  PA-CMG2  interaction.  The  pocket  is  adjacent 
to  the  MIDAS  motif  and  is  formed  by  two  exposed  tyrosines  (119  and  158)  and  the  P4- 
a4  loop,  which  line  the  sides  of  the  pocket,  and  by  a  histidine  at  its  base.  The  pocket  is 
conserved  in  TEM8,  but  does  not  exist  in  the  integrins  I  domains,  thus  providing  further 
specificity.  The  importance  of  this  loop  was  shown  by  systematic  mutation  of  the  PA 
molecule,  which  revealed  3  mutations  in  this  loop  that  reduced  toxicity  by  >  100-fold, 
including  G342  at  the  tip  of  the  P-hairpin  that  inserts  into  the  pocket^. 

Biophysical  studies  of  channel  conductance  by  PA63  pores  indicate  that  the  entire 
region  encompassed  by  residues  275-352  (strands  P2  and  P3  and  flanking  loops;  see 
Fig.  3)  in  domain  II  rearranges  to  form  a  long  P-hairpin  that  lines  the  channel  lumenAA 
This  requires  that  the  P2  and  p3  strands  and  the  P3-P4  loop  peel  away  from  the  side  of 
domain  II.  For  this  to  happen,  domain  IV,  which  packs  against  them  in  the  pre-pore, 
must  separate  at  least  transiently  from  domain  II.  Thus,  by  binding  to  both  domains  II 
and  IV,  CMG2  may  restrain  the  conformational  changes  that  lead  to  membrane 
insertion.  Indeed,  while  PA63  heptamers  insert  into  artificial  planar  bilayers  (in  the 
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absence  of  receptor)  when  the  pH  is  reduced  to  6.5,  the  pH  requirement  for  receptor- 
mediated  insertion  on  cells  is  more  stringent,  requiring  a  pH  of  5.5^.  Thus,  we 
propose  that  the  binding  of  CMG2  to  the  P3-P4  loop  stabilizes  the  pre-pore 
conformation  at  neutral  pH;  that  is,  the  receptor  may  act  as  a  chaperone  to  prevent 
premature  membrane  insertion  on  the  cell  surface  prior  to  endocytosis.  The  titration  of 
histidines  is  implicated  in  triggering  the  conformational  switch.  The  histidine  at  the 
base  of  the  CMG2  pocket  has  no  H-bonding  partners,  and  is  close  to  an  arginine 
sidechain  from  the  P3-P4  loop  of  PA.  Protonation  of  this  histidine  provides  a  plausible 
trigger  for  the  release  of  domain  II  from  CMG2  in  the  acidified  endosome.  Moreover, 
the  structure  of  the  P3-P4  loop  is  pH  sensitive,  since  it  is  ordered  in  crystals  of  PA 
grown  at  pH  7.5  (in  the  absence  of  receptor),  but  disordered  in  crystals  grown  at  pH 
6.06. 


It  is  straightforward  to  model  the  7:7  heptameric  PA63-CMG2  complex,  since  the 
crystal  structure  of  the  “pre-pore”  is  known*!  (Fig.  5).  Seven  CMG2 1  domains  lie  at  the 
base  of  the  heptameric  “cap”,  increasing  its  height  by  35  A.  The  I  domains  are  well 
separated,  consistent  with  a  7:7  binding  stoichiometry!  1,  and  their  N-  and  C-termini 
point  downwards,  towards  the  membrane.  In  the  transition  from  pre-pore  to  pore,  the  7 
hairpin  loops,  one  from  each  PA  monomer*^,  are  predicted  to  create  a  14-stranded 
membrane-spanning  p-barrel.  Assuming  an  a-hemolysin-like  structure  !? ,  the  barrel 
extends  ~75  A  below  the  I  domains,  with  the  bottom  30  A  spanning  the  membrane. 

This  leaves  ~40  A  between  the  bottom  of  the  I  domains  and  the  membrane  surface, 
which  may  be  occupied  by  the  second  domain  of  CMG2,  which  comprises  ~  100 
residues  between  the  I  domain  and  its  C-terminal  transmembrane  sequence.  Thus,  the 
receptor  may  support  the  heptamer  at  the  correct  height  above  the  membrane  for 
accurate  membrane  insertion,  which  is  stoichiometric  on  cells  but  less  efficient  in  the 
absence  of  receptor!  6 
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Soluble  versions  of  the  CMG2  and  TEM8  I  domains  protect  against  anthrax 
toxicity  by  acting  as  decoys^l^  and  our  structure  will  allow  for  the  design  of  new 
therapeutic  agents  that  disrupt  the  PA-receptor  interaction.  TEM8  is  strongly 
upregulated  on  the  surface  of  endothelial  cells  that  line  the  blood  vessels  of  tumours, 
while  CMG2  is  widely  expressed  in  most  tissues^, 19.  Anthrax  toxin  is  being 
developed  as  an  anti-tumour  agent^O,  and  our  structure  will  allow  the  design  of  PA 
molecules  that  bind  better  to  TEM8  than  to  CMG2,  thus  minimizing  the  toxic  side- 
effects  from  binding  to  CMG2  in  normal  tissues. 


Methods 

Protein  expression  and  purification 

PA  was  prepared  as  previously  described^.  The  I  domain  of  CMG2  was  cloned 
as  an  N-terminal  His-tag  fusion  in  pET15b  (Novagen)  and  expressed  in  E.  coli  strain 
BL21(DE3).  Following  induction  of  cell  cultures  with  0.5  mM  IPTG  for  2  h  at  37°C, 
CMG2  was  purified  from  the  soluble  fraction  of  the  cell  lysate  by  Nickel  affinity 
chromatography  (HiTrap  chelating  HP,  Pharmacia),  followed  by  removal  of  the  tag  with 
thrombin  (Sigma),  ion  exchange  (HiTrap  monoQ,  Pharmacia)  and  gel  filtration 
(Superdex  S75,  Pharmacia),  affinity  removal  of  thrombin  (HiTrap  benzamidine  FF, 
Pharmacia)  and  incubation  in  a  buffer  containing  100  mM  EDTA  to  strip  bound  metal. 
The  final  product  was  dialysed  and  concentrated  to  15-20  mg/ml  and  flash-frozen  in  150 
mM  NaCl,  20  mM  TrisCl  pH7.5,  and  comprises  residues  40-218  of  CMG2386  (accession 
number  AAK77222)  plus  an  N-terminal  extension  of  sequence  GSHMLEDPRG  as  a 
result  of  the  cloning  strategy.  The  molecular  weight  was  confirmed  by  MALDI-TOF 
mass  spectrometry.  To  prepare  the  PA-CMG2  complex,  PA  was  mixed  at  a  final 
concentration  of  4  mg/ml  with  a  3-fold  molar  excess  of  CMG2  and  a  2-fold  excess  of 
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Mn Cl 2.  incubated  for  20  min  at  room  temperature  and  purified  by  gel  filtration 
(Superdex  S200,  Pharmacia).  The  complex  was  extensively  dialysed  and  exchanged, 
and  concentrated  to  6  mg/ml  in  20  mM  TrisCl  pH  7.5, 10  pM  MnCl2  for  crystallization 
trials. 

Crystallization  and  structure  solution 

Needle-like  crystals  grew  to  a  size  of  10  x  10  x  500  pm  in  5-10  days  at  room 
temperature  in  a  sitting  drop  vapour  diffusion  set-up  using  a  reservoir  buffer  containing 
50-100  mM  CHES  pH  9.0-9. 2,  25%  PEG400.  Crystals  were  flash-frozen  at  4°C  in 
liquid  nitrogen  using  the  crystallization  buffer  with  40%  PEG400  as  a  cryoprotectant 
prior  to  diffraction  analysis.  They  belong  to  space  group  P2 1 2 1 2 1  with  unit  cell 
parameters  a  =  88.2  A,  b  =  94.2  A,  c  =135.6  A.  There  is  one  PA-CMG2  complex  in  the 
asymmetric  unit.  A  complete  native  data  set  to  2.5  A  was  collected  at  beamline  9-1  at 
SSRL  on  a  ADSC  Quantum-315  CCD  detector  and  processed  with  the  HKL  package^! 
(see  Table  1).  PA  was  positioned  in  the  unit  cell  by  Molecular  Replacement  (PDB  ID 
code  lacc)6  using  MOLREP,  and  refined  with  REFMAC  version  5.0^2.  Density  for  the 
MIDAS  Mn2+  ion  and  upper  loops  of  the  receptor  was  evident  in  this  map,  and  a 
molecule  of  CMG2  (PDB  ID  code  1SHT)23  was  manually  placed  in  the  electron 
density.  Model  building  was  performed  with  C>24  and  TURBOFRODC)25;  and  the 
solvent  structure  was  built  with  ARP/wARP  6.026  Although  the  random  errors  in  the 
diffraction  data  are  high,  owing  to  the  small  crystal  size,  the  final  refinement  statistics 
and  maps  are  excellent  (Table  1).  Thus,  the  final  R  factors  are  (Rfree=  26.6%,  Rwork 
=  20.7%)  overall  and  (Rfree  =  37.2  %,  Rwork  =  27.5%)  in  the  outer  resolution  bin,  with 
RMS  deviations  from  ideal  values  of  0.017  A  for  bond  lengths  and  1 .65°  for  angles. 
Stereochemistry  is  excellent  as  assessed  with  PROCHECK.22,  and  the  model  is 
consistent  with  composite  simulated  annealing  omit  maps  (3000°C)  calculated  in 
CNS27.  The  model  comprises  residues  16-735  of  PA;  41-210  of  CMG2,  with  the 
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exception  of  three  loops  (residues  159-174,  276-287  and  304-319)  in  PA,  for  which  no 
electron  density  was  observed;  139  water  molecules;  2  Ca2+  ions  in  PA  domain  I;  2  Na+ 
ions;  one  PEG  molecule;  and  one  Mn2+  ion  at  the  MIDAS  site.  Although  the  MIDAS 
metal  ion  in  vivo  is  likely  to  be  Mg2+,  we  have  previously  shown  for  integrin  I  domains 
that  the  stereochemistry  of  the  open  conformation  is  not  dependent  on  the  nature  of  the 
metal  ionA  PA  domain  1  (residues  16-258)  undergoes  a  small  rotation  as  a 
consequence  of  crystal  constraints  when  compared  with  the  structure  of  isolated  PA 
such  that  the  rmsd  values  for  the  superposition  of  the  two  molecules  are  1 .44,  0.58  and 
0.79  A  for  residues  16-735,  259-735  and  16-258  respectively.  CMG2  residues  41-200 
superimpose  with  an  rmsd  of  0.60  with  the  isolated  protein^  while  the  C-terminal 
helix  (residues  201-210)  shifts  downward  by  one  helical  turn. 
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Figure  1  Structure  of  the  PA-CMG2  complex.  Two  orthogonal  views  are  shown 
as  ribbons.  PA  is  coloured  by  domain  (l-IV).  CMG2  is  in  blue.  The  metal  ion  is 
shown  as  a  magenta  ball.  All  molecular  graphics  images  were  generated  using 
the  UCSF  Chimera  package28  (http://www.cgl.ucsf.edu/chimera). 


Figure  2  Comparison  of  the  MIDAS  motifs  of  the  a,  PA-CMG2  and  b,  integrin 
a2pi -collagen8  complexes.  Coordinating  side  chains  and  two  water  molecules 
(co)  are  shown  as  ball-and-stick.  The  metal  is  shown  in  blue.  Carbon  and 
oxygen  atoms  from  CMG2  and  integrin  are  dark  blue  and  red,  and  numbered  in 
black  for  CMG2.  D683  from  Domain  IV  of  PA,  and  the  analogous  collagen 
glutamic  acid,  are  in  gold.  Loops  are  shown  as  grey  ribbons.  Hydrogen  bonds 
to  the  metal-bound  waters  are  shown  as  dotted  lines. 
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Figure  3  Intermolecular  contacts  between  PA  domains  II  and  IV  and  CMG2. 
Contacting  regions  are  colored  blue  and  green  for  PA  domain  IV  and  CMG2, 
respectively.  The  p3-p4  loop,  p2  and  p3  strands  and  p2-  p  3  loop  of  PA  domain 
II,  which  are  implicated  in  pore  formation,  are  highlighted  in  red.  The  p2-p3 
loop,  which  is  disordered  in  monomeric  PA,  is  shown  as  a  dashed  line.  The 
MIDAS  metal  is  labeled  “M”.  The  side  chains  of  PA  D683  and  CMG2  HI 21  are 
shown  as  ball-and-stick  in  gold  and  cyan,  respectively. 


Figure  4  Key  elements  of  the  PA-CMG2  interaction  a,  Solvent-accessible 
surface  (probe  radius  1.4  A)  of  the  PA  domain  IV  groove,  with  key  side  chains 
from  three  CMG2  loops  (pi-al,  blue;  p2-p3,  red;  a2-a3,  green)  shown  as  ball- 
and-stick  (C:  yellow,  O:  red,  N:  blue).  The  green  loop  forms  the  top  of  the 
groove.  The  MIDAS  metal  is  labelled  (M).  b,  Solvent-accessible  surface  (probe 
radius  1.0  A)  showing  the  CMG2  pocket  into  which  the  PA  p3-p4  loop  (red 
ribbon)  inserts.  The  pocket  is  formed  by  three  CMG2  sidechains  (shown  as 
ball-and-stick)  and  the  p4-a4  loop  (cyan). 


Figure  5  Hypothetical  model  of  the  receptor-bound  membrane-inserted  PA 
pore.  The  PA63  heptamer  (red)  is  based  on  the  pre-pore  crystal  structure6,  with 
a  hypothetical  model  of  a  membrane-spanning  14-stranded  barrel17  formed  by 
rearrangement  in  each  monomer  of  the  segment  shown  in  red  in  Figure  3. 
Seven  copies  of  the  CMG2  I  domain  bound  to  the  heptamer  are  in  blue.  The  40 
A  gap  may  be  occupied  by  a  ~1 00-residue  domain  of  CMG2,  C-terminal  to  the  I 
domain,  which  precedes  its  membrane-spanning  sequence. 


Table  1:  Data  collection  and  refinement  statistics 


Space  group 
Unit  cell  (A) 

Resolution  (A) 

Wavelength  (A) 

Rmerge  (%) 

I/a 

a  cutoff 

Average  redundancy 
Completeness  (%) 

Mosaicity 
Rwork  (last  shell) 

Rfree  (last  shell) 
a  cutoff 

rmsd  bond  lengths  (A) 
rmsd  bond  angles  (°) 
Ramachandran  plot  (residues,  %) 
Most  favoured 
Additionally  allowed 
Generously  allowed 
Disallowed 


P212121 

a  =  88.2,  b  =  94.1,  c  =  135.6 
30  -  2.5 

0.892 

17.6  (89.1) 

11.5  (2.4) 

none 

5.3  (5.2) 

99.9 

0.4 

20.7  (27.5) 

26.6  (37.2) 
none 

0.17 

1.65 

655  86.3% 

101  13.3% 

3  0.4% 

0  0% 


Values  in  parentheses  refer  to  the  highest  resolution  shell. 
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